Core Data batch updates, introduced in iOS 8 and OS X Yosemite, aim at fixing a long-standing limitation of the Core Data stack, as developers had been asking for many years. Let's review the problem that batch updates solve, how they work, and an alternative to them involving a rethinking of data normalization strategy.
The problem with Core Data
As Brent Simmons, at the time developer of successful RSS reader NetNewsWire for OS X, wrote a couple of years ago, Core Data can easily become a performance bottleneck in apps requiring a basic operation, such as changing an attribute value or deleting an entity, to be applied to a large set of objects. The reason for this, according to Brent, and others lays with Core Data being an object graph persistence manager.
As Marcus S. Zarra, author of "Core Data. Data Storage and Management for iOS, OS X, and iCloud", put it, while in the database world such operations are "very, very easy to do no matter how many" objects you have, they become "painful" with an object graph because they require "to load in all those records, change literally one bit, and then write all those records back out again."
Brent Simmons ended up switching away from Core Data and using directly SQLite, but newly introduced batch updates aim to offer an alternative.
Core Data batch updates
The steps required to execute a batch update are shown below:
//-- Create a NSBatchUpdateRequest for your entity NSBatchUpdateRequest *batchUpdateRequest = [[NSBatchUpdateRequest alloc] initWithEntity:entityDescription]; //-- Our batch update will return an array of objects ID [batchUpdateRequest setResultType:NSUpdatedObjectIDsResultType]; //-- Configure Batch Update Request [batchUpdateRequest setPropertiesToUpdate:@{ @"done" : @YES }]; //-- Execute Batch Request NSError *batchUpdateRequestError = nil; NSBatchUpdateResult *batchUpdateResult = (NSBatchUpdateResult *)[self.managedObjectContext executeRequest:batchUpdateRequest error:&batchUpdateRequestError];
The above snippet of code comes from a full step by step tutorial by Bart Jacobs that shows how to use batch updates in a iOS 8 app. As Bart notices, batch updates bypass Core Data context manager altogether, so another step that is required is "faulting" all of the objects in your current context managers to make them refresh their values:
NSArray *objectIDs = batchUpdateResult.result; for (NSManagedObjectID *objectID in objectIDs) { // Turn Managed Objects into Faults NSManagedObject *managedObject = [self.managedObjectContext objectWithID:objectID]; if (managedObject) { [self.managedObjectContext refreshObject:managedObject mergeChanges:NO]; } }
Bypassing the context manager also has another consequence: no validation is performed on data written to the persistent store in a batch update. So, the responsibility not to add invalid data is on the developer.
Critical remarks
Marcus S. Zarra has a rather critical approach to batch updates. He summarizes his basic stance by saying that Core Data should be properly considered and handled as an object graph which also happens to allow to persist the objects it manages. The main implication of this is that, if you ever need to "update 10 million records just to change a bit," there is possibly something to be fixed at the object design level. If you have one bit of state in your attributes that might require such a batch update, then a possible alternative approach is using "a higher order of that state," so you just change it in one place.
Taking Brent Simmons' case as an example, says Marcus, that would mean, instead of setting the "read" flag for each feed entry, store a date somewhere so you know you are going to ignore all entries older than that. This leads Marcus to his second suggestion for Core Data: "Core Data is not a database, again. I will say that a lot. We actually want to denormalize the data as much as possible. If it’s a calculated field, we don’t want to have to calculate that every time because it’s an object, so we want to store that calculation in the database."
Another issue that Marcus sees with batch updates is the need for refreshing all of the objects that are in use by the app. This can be as easy as telling, e.g., each table view to reload its data, but if the "application is holding on to objects and we’ve got a lot of strong references and things like that, then we get into some more hairy code."