The ADO.NET Team recently discussed various performance aspects of the ADO.NET Entity Framework. The ADO.NET Entity Framework entered its third beta back in December and since that time the team has provided developers with information about using the framework but now they are providing developers with the performance aspects.
The articles spends time looking at the performance of the ADO.NET Entity Framework by breaking down the stack and showing how to speed up a simple query and explain the performance characteristics of the framework.
It's important to point out when a layer of abstraction or something else like an EDM is used to transform the relational schema of a database, there is going to be a performance decrease.
The Query and Results
The article uses the NorthWind database for the model and creates a very simple query:
using (NorthwindEntities ne = new NorthwindEntities())
{
foreach (Order o in ne.Orders)
{
int i = o.OrderID;
}
}
The test was run for 10 iterations over a total of 848 rows for each query. The results are interesting with the first run being 4241 ms and each subsequent run averaging around 13 ms. A good part of the time is the creation of the ObjectContext and then executing any operation that accesses the database, some expensive operations occur.
Breaking down each operation by percentage gives us some insight:
- Loading Metadata (11%)
- Initializing Metadata (14%)
- Opening Connection (8%)
- View Generation (56%)
- Load Assembly (2%)
- Tracking (1%)
- Materialization (7%)
- Misc (1%)
By far the biggest percentage of time is View Generation at a whopping 56%. When View Generation is the primary time cost, developers can use the EDM generator (EdmGen.exe) command line tool with the view generation command parameter (/mode:ViewGeneration), the output is a code file (C# or VB.NET) that can be included in the project. Having the view pre-generated reduces the startup time down to 2933 ms, about a 28% decrease in the overall time for the iteration. Generating the views and distributing with applications is a good solution to gain performance. The downside is the views are no longer dynamic and need to be regenerated and kept synchronized when there are changes to the model.
Query Performance
It's pointed out that a major design element for performance is the query cache. Once a query is executed, parts of it are maintained in a global cache. The query and metadata caching results in the second run always executes faster than the first run. For example, looking at this Entity SQL query:
using (PerformanceArticleContext ne = new PerformanceArticleContext())
{
ObjectQueryorders = ne.CreateQuery ("Select value o from Orders as o");
foreach (Orders o in orders)
{
int i = o.OrderID;
}
}
The first run of this query takes 179 ms, but the next run takes only 15 ms. The execution difference between the initial and subsequent ones is building the command tree that gets passed down to the provider for execution.
LINQ queries are similar to Entity SQL queries in the way it executes. For example, the query below:
using (PerformanceArticleContext ne = new PerformanceArticleContext())
{
var orders = from order in ne.Orders
select order;
foreach (Orders o in orders)
{
int i = o.OrderID;
}
}
The execution of the LINQ query takes 202 ms initially and 18 ms on subsequent executions, still slower than Entity SQL. Taking a look at using compiled LINQ queries to improve performance further. The advantage of compiling a LINQ query is that the expression tree is built when the query is compiled and doesn’t need to be rebuilt on subsequent executions. The code for the compiled LINQ query looks like this:
public static Func
Notice PerformanceArticleContext is a delegate. The execution time for the compiled LINQ query is 305 ms on the first execution and 15 ms on subsequent ones. The results are not amazing but it's interesting to note the 3ms decrease in execution time for the compiled LINQ query from the the regular one, not important for a few queries but has value when performing thousands of queries.
The ADO.NET team suggests being careful with the Track/NoTrack options in your queries:
In the previous examples, all the queries result in the creation of an object that gets added to the ObjectStateManager so that we can track updates. When it is not important to track updates or deletes to objects, then executing queries using the NoTracking merge option may be a better option. For example, NoTracking may be a good option in an ASP.NET web application that queries for a specific category name but doesn’t make updates to the returned data. In a case like this, there is a performance benefit to using NoTracking queries.
Based on the numbers, the NoTracking option provides a big reduction in the amount of time, where most of this gain comes when we stop tracking changes and managing relationships. For a NoTracking query, the compiled LINQ query outperforms the standard LINQ query both in first execution and in subsequent executions. Note that the second execution of the compiled LINQ query is equivalent to the second execution of the Entity SQL query.
The ADO.NET team also suggests keeping a few things in mind when creating queries:
When optimizing query performance in the Entity Framework, you should consider what works best for your particular programming scenario. Here are a few key takeaways:
- Initial creation of the ObjectContext includes the cost of loading and validating the metadata.
- Initial execution of any query includes the costs of building up a query cache to enable faster execution of subsequent queries.
- Compiled LINQ queries are faster than Non-compiled LINQ queries.
- Queries executed with a NoTracking merge option work well for streaming large data objects or when changes and relationships do not need to be tracked.
For more information on ADO.NET and Entity Framework information, please check out the ADO.NET Team Blog.