The Art of SQL Server Database Administration, Development, and Career Skills for the Technically Minded
As a database administrator I’m most concerned about what’s happening in production – not how many lines of code were saved in development. For this reason I started researching EF on my own to discover why it’s workloads always seemed so … well, … wonky … on my end. My reasoning wasn’t to demonstrate why EF sucks – it doesn’t – but to answer this simple question. Can it save lines of code AND send better TSQL? Hint: There are practices to avoid, see my other post on Entity Framework, but often the answer is yes.
Today I’d like to dig into an area from the first post a little deeper.
FOR EACH LOOPS = CURSORS (ANOTHER LOOK)
There is a pretty cool feature of Entity Framework. When you declare an object mapped to a table that has foreign keys, you automatically get references to those related entities. Navigating relations in this way really is technically very clever – hats off to the folks who worked this code out at Microsoft. Unfortunately it can have a significant hidden cost when buried within loops.
As I’ve complained before, Entity Framework abstracts the TSQL workloads that get passed back to the SQL Server. Often, this means that the developer might not be aware of how her/his code is being received by the server.
What follows is a perfect example.
In the AdventureWorks sample database the SalesOrderDetail table is a child table of SalesOrderHeader. So, if I wanted to see all the product id’s that a single customer ordered – I would need to join Customer table to SalesOrderHeader table and then join the SalesOrderHeader table to the SalesOrderDetail table. In raw TSQL this is a pretty simple join, but in Entity Framework I may be tempted to write it like this and let EF navigate the relationships.
AdventureWorks2012Entities DBContext = new AdventureWorks2012Entities(); var custorder = from o in DBContext.SalesOrderHeaders where o.CustomerID.Equals(customer) select o; foreach (var ordheader in custorder) { foreach (var orddetail in ordheader.SalesOrderDetails) { Console.WriteLine(orddetail.ProductID.ToString()); } }
In my previous post I said to avoid LINQ within FOR EACH loops. As you can see in this example however I am not issuing a LINQ statement – so, it’s not a clear violation of that code smell. What EF is doing is navigating relationships within the entity mappings to find related objects to the original LINQ statement. Like I said, pretty cool, but what is the cost?
Take a look at this code in a trace. In the following image you can see the console result of my code. 33 product IDs were returned. To obtain the ProductID by navigating the relationships, Entity Framework issued 77 individual interactions with SQL Server. 33 of these statements were select * statements against SalesOrderDetail. All but one of the columns from each select statement are never even used. In short, it is basically a cursor.
If you take one of these statements and run manually you find that each statement accounts for 600 bytes. 600 bytes x 33 runs = ~19 KB.
What if we’d of written it like this?
AdventureWorks2012Entities DBContext = new AdventureWorks2012Entities(); var custorder = from d in DBContext.SalesOrderDetails join o in DBContext.SalesOrderHeaders on d.SalesOrderID equals o.SalesOrderID where o.CustomerID.Equals(customer) select new {d.ProductID}; // projection foreach (var prodord in custorder) { Console.WriteLine(prodord.ProductID); }
We’ve avoided burying EF constructs within the FOR EACH loop and returned to using LINQ in a set based way. The resulting trace now looks like this. Notice I’m also using projection to get away from the select * problem.
Further the total number of bytes retrieved from the server is now 339 BYTES. That’s one single select statement pulling 339 BYTES vs 33 individual statements pulling 19 KB. That is 339 vs 19000 if you put them both in terms of bytes. The second run pulls 2% of the total data retrieved from the first run!!! In any type of scaled environment that adds up!!
The moral of the story is this. If you are using Entity Framework to reduce lines of code and work in an object oriented / strongly typed environment – great, I support you. But, to get it to work quickly and efficiently don’t ever forget that SQL Server still wants set based workloads.
Entity Framework can send set based work – but you have to know how to help it out. The purpose of this post also isn’t to dissuade you from letting EF navigate relationships – just know what the cost is – ESPECIALLY if you’re doing work inside loops that may scale to hundreds, thousands, millions of iterations.
Want your DBA to stop complaining about Entity Framework? Make the case by writing EF code that sends better TSQL.
In my next Entity Framework performance post we’ll take a look at how EF can cause SQL Server to make poor index decisions in the query plan – and what can be done about that.
Pingback: (SFTW) SQL Server Links 27/02/15 - John Sansom