The Art of SQL Server Database Administration, Development, and Career Skills for the Technically Minded
ATTENTION!! Some time after writing this original post I joined Pluralsight as an author. My very first course was a 2 hour collection of modules on this very topic. If you have a Pluralsight subscription I highly recommend the much deeper dive into EF performance that you can find here. Entity Framework Database Performance Anti-patterns
An ORM (object relational mapper) can make the task of developing code against the data tier much easier for the Software Engineer. One of the more popular ORMs is Entity Framework. EF has come a long way. Each release gets a little better, in terms of performance however, is still far from perfect. EF can be a real dog.
Slow or not, the reality remains. The typical Software Engineer would rather write classes, functions, and code in C#, F#, Java, or other language of choice over TSQL. They crave strongly typed environments, hip design patterns, diet coke, and bacon. TSQL to them is like assembler of old. Fast execution? You bet! Also archaic, slow to develop, and often unnecessary (their words, not mine). Those that do enjoy TSQL end up as full time DBAs, data architects, or consultants.
As a DBA working in a world where EF resides these are the 5 things I think both DBAs and Software Engineers should understand about EF.
1. Size Matters – Maybe More Than Ever
A big complaint against Entity Framework is that it brings back more data than it actually needs. If this problem is compounded by improper column sizes and data types, you can easily be drawing exponentially more data than needed to satisfy business requirements. If this extra data causes a query to have to go to disk vs memory, the performance impact can be orders of magnitude above that.
Software Engineers should understand this concept even if they never plan to write another TSQL statement again. Code First doesn’t relieve anyone the responsibility of a sound understanding of relational architecture and data type/length choices.
Code first is a concept in Entity Framework where an SWE can design a table structure declaratively within the language (similar to the way you would declare variables, classes, and objects). EF will create the database objects it’s self. This can make the database development process more comfortable for the developer but abstraction often masks architectural weaknesses.
2. Don’t Query More Than You Need – Understand Projection
Even if your data types and column sizes are perfectly scoped and sized Entity Framework makes it simple to over-work your database server by constantly requesting more columns than needed. DBAs regularly preach that users should avoid ” select * “. Googling Entity Framework code samples or perusing MSDN however gives the average EF programmer an endless supply of code that is guilty of this very thing. Consider the following EF C# code taken directly from an MSDN example on querying with EF.
// Create the query. var studentQuery = from student in students where student.Scores[0] > 90 select student; // Execute the query. foreach (Student student in studentQuery) { Console.WriteLine("{0}, {1}", student.Last, student.First); } // Output: // Omelchenko, Svetlana // Garcia, Cesar // Fakhouri, Fadi;
If you were to trace this connection on the SQL Server end you’d see the query selecting every single column even though the sample business case only cares about first and last name. This can be overcome with projection.
Projection is basically taking the members of one object and projecting them as another, often smaller, object. Were the sample EF code above to use projection, it would be something more like this:
var studentQuery = from student in students where student.Scores[0] > 90 select new {fname = student.First, lname = student.Last};
The resulting generated TSQL would select lname, fname only. EF leverages the projection and recognizes nothing else is desired when generating the TSQL that is ultimately passed to the database server. (Image in tip 3 shows this as well.)
3. Obtaining TSQL for Tuning
To truly performance tune a query, the DBA or DBE doing the tuning isn’t going to want to look at LINQ syntax. Converting this to the actual TSQL has gotten much easier in modern versions of EF. Developers don’t have to rely on trace alone.
To see, for example, the TSQL that a LINQ query would generate – the dev would simply need to do a breakpoint, screen print, or log the dataset variable via it’s .ToString() method. .Net returns the EF generated TSQL as shown here.
Older EF versions could do this as well, but not through the ToString() method. While doable, it wasn’t always very straight forward, requiring different class objects and specific circumstances to perform.
Once the DBA has tuned the TSQL statement it’s decision time. Implement it back into EF in the form of an improved LINQ query or hit the connection with raw TSQL. EF supports communicating in raw TSQL or calling procs in these edge cases when desired performance can’t be reached any other way. If the decision is to turn the TSQL back into LINQ, there is a translation tool that makes this fairly easy. It’s called StackOverflow.com.
4. LINQ Inside For Each Loops = Cursor
If there is one convention within TSQL where developers and DBAs often butt heads, it’s the cursor. Cursors are a TSQL indicator to the DBA that someone, somewhere, gave up. To many developers however, iterative loops is just how things are done. SQL Server works best in sets. Period. Looping logic “row by agonizing row” is a sure fire recipe for performance complaints. EF is getting better about sending set based work, but one area that still essentially creates a client side cursor is nesting LINQ Queries inside For Each (or other type) loops.
Entity Framework abstracts TSQL, so the developer often doesn’t know exactly what their code is doing on the SQL Server end. It may be up to the DBA then to remind developers to avoid building LINQ queries inside of loops. Notice how the following code looks in a trace for example. (Also notice how lack of projection causes an even bigger problem by selecting every single column).
var products = from s in DBcontext.SalesOrderDetails group s by s.ProductID into g select new { ProdID = g.Key, AveragePrice = g.Average( p => p.UnitPrice * p.OrderQty) }; foreach (var product in products) { if (product.AveragePrice > 1000) { // try not to do this. linq query in loop = slow var rec = (from u in DBcontext.Products where u.ProductID == product.ProdID select u).Single(); Console.WriteLine(rec.Name); } }
Causes this:
A better approach is to structure your LINQ query such that it can obtain the data necessary outside of the loop in a single set based approach. You can then access the local object containing cached data within the loop. Further, the example above should be using projection.
Pssst, it isn’t only LINQ statements that can cause this cursor type behavior. In this follow up post I demonstrate how objects can do it too. Objects are just more sneaky about it.
5. Lazy Loading vs Eager Loading
Last tip. The DBA and SWE should both understand the basic concepts of Lazy Loading vs Eager Loading and how to make EF use one or the other as the situation dictates.
Lazy Loading will only request a subset of the total data potentially required. When implemented right, it takes a similar form to an OFFSET and FETCH statement. Lazy Loading isn’t always bad. If you were looking for a new phone case – you wouldn’t want to wait while Amazon returns every single phone case in the world – you’d want 10-50 at a time to look at. So too would the DB server. Chances are you’ll never get to the end. Don’t pull more data than you need.
Lazy Loading CAN cause performance problems however when entire sets are needed in a process and the application is constantly pestering the SQL Server for additional rows. Simply put, this is akin to just another cursor. I’ll point you to the interwebs for more on this topic, but if you’re in an EF shop, both the DBA and SWEs should understand the concept well.
Remember
Pingback: (SFTW) SQL Server Links 09/01/15 - John Sansom
IMHO this is going to be so important for the DBA of the future (or even today). A decade or so ago programmers had to speak our language; now that they are entrenched in EF/LINQ it seems DBAs have to learn to speak their language but we haven’t.
I have tried to dip my toes into EF but it’s huge because it has gone through so many revisions each adding a completely different way of looking at things (objects to entities, code first, model first, and whatever else); let alone the complexities of LINQ!
What we’re really missing is a succinct guide written *for* busy SQL Server DBAs on the different ways EF/LINQ directly query into the database; with real world examples of the architectures being used and advice on decoding and diagnosing (and fixing) the problems that arise from it.
Instead we only have huge EF/LINQ books for .NET programmers, each of which only covers a small fraction of what’s going on, and not really from the SQL Server perspective that relates to my DBA duties.
Totally agree. This post is a direct result of me trying to dig in and understand the vast abyss that is EF. I know I’ve only scratched the surface of a very BIG ball of yarn.
With the cloud abstracting much of the DBAs responsibilities away from hardware and infrastructure, I see our roles becoming more and more architecturally and code focused.
SWE using ORM is killing performance 🙂 – Thanks for a good read.
Thanks for reading good !!
Hello Russ,
I’ve watched your courses about database security and about using Entity Framework. If I’m allowed to, please let me ask some questions about ORMs and security to you as somebody who is familiar with both topics*.
At my company, we are currently discussing about using ORM (EF6) and Stored Procedures (sprocs) with reference to security:
• use sprocs only, all free** CRUD operations are forbidden on the database to gain security, using an ORM does not provide significant advantages anymore
• use it in a mixed fashion, get the best of both worlds but having trade offs in security
o forbidding free CUD-Operations and perform them via sproc, free Selects are allowed, using the ORMs ability to easily make DB queries
o allow free CRUDs and using the ORMs ability to easily make DB queries and to save modified object graphs
o forbidding all Delete operations …??
Is it a good a way to gain security by forbidding free CRUD? Or is it better to focus on other security mechanisms as you described them in your course with resources saved by using an ORM?
Thanks in advance
Otto
*There are a lot of discussion you can find in the web, but it seems to me, people who are familiar with ORMs are debating with people who are familiar with sprocs with people who are familiar with security – not knowing if anybody is getting the whole picture (https://kevinlawry.wordpress.com/2012/08/07/why-i-avoid-stored-procedures-and-you-should-too/)
** free in the sense of CRUD operations that are not encapsulated in a sproc