Entity Framework Core (EF Core) offers significant performance improvements over Entity Framework 6 (EF 6). None are so readily apparent as when dealing with large collections.
When working with EF 6, a problem you may have experienced is context fouling. The longer you use a single DBContext
object, the slower it becomes. This can be hard to see during testing if the number of records is small, but the cost increases proportionally to the number of objects stored in the change tracking system.
For this test, 10,000 rows were inserted into a table with eight columns on a local server. When performed in a single batch, it took over 160 seconds. Since this is a local sever, the network latency is negligible.
Using two batches of 5,000 reduced the total time to 82 seconds. Batches of 1,000 further reduces the run time to roughly 19 seconds.
This progression continues until approximately 100 batches of 100 rows each, which only takes 1.4 seconds. Below that, the run time starts increasing again with a batch size of 25 requiring 6.3 seconds. Coincidentally, a batch size of 250 likewise takes about six and a half seconds.
In the timings above, a new DBContext
was created for each batch. In order to determine whether this was actually caused by context fouling, the same test was run again with one modification. Instead of creating a new DBContext
for each batch, only one DBContext
was used with SaveChanges
being called between batches. As the line labeled “EF 6 B” shows in the chart below, the frequency in which you call SaveChanges
has no effect.
When looking at the chart, you may have noticed that EF Core appears to have a straight line. This is just an illusion caused by the dramatic increase in run time EF 6 experiences with batch size. Zooming in, you can see that batch size does have a small effect on EF Core.
Again, a batch size of roughly 100 improves performance. But the drop from the worst case of 2 seconds to a best case of 1.4 seconds may not be significant enough to consider.
Does this mean you should automatically start using batches of 100? Not necessarily. As mentioned above, this was using a local database. Once you add the cost of network latency, the sizes of the batches should be increased to reduce the number of round-trips to the database.
In the interest of completeness, ADO.NET without an ORM was also tested. Using multi-value insert statements with 250 rows per batch, it was able to complete the 10,000-row test in as little as 1.2 to 1.7 seconds.
For even better performance, we can turn to database-specific techniques such as SqlBulkCopy
. With 10,000 rows, the run time was only 0.059 seconds. Even with 100,000 rows, the run time only increased to 0.647 seconds.