The Art of SQL Server Database Administration, Development, and Career Skills for the Technically Minded
The topic this month is learn something new and then teach it to everyone else. Great topic Mike!
Late last year I set the goal to finish up my MCSE. I had passed the previous tests, earning the MCSA certification, but Microsoft was clever in their naming of these certs. It’s hard to get too excited about MCSA. After all, I don’t want to just be “associated” with SQL Server. I want to be an “expert”. So, I decided to break down and take the second set of tests to “level up”. I finished the first one and will be taking the final one in the next few weeks – hopefully. I guess currently that means I am only 80% expert.
While studying it became evident that I would need to understand memory optimized tables, index options, natively compiled procedures etc. I had curiously peered into the topic before but hadn’t ever taken a real world project or learning opportunity to “know” in memory features. So I finally decided to get up to speed on the new craze in SQL Server. In-memory OLTP. You find evidence of this in last month’s TSQL Tue in which I posted about the improvements some of our sample ETL processes experienced by using in memory tables on the stage layer of a major ETL project.
As my submission for this month’s rotating blog party, which by the way is hosted by my good friend Mike Donnelly who you can find on Twitter @SQLMD and as my entry for this week’s #sqlnewblogger challenge, I’d like to share something else I learned along the way. Hash indexes!
Once upon a time a friend of mine who is a DBA on another platform asked what kind of indexes SQL Server used. My answer was, B-Tree of course… is there any other kind? He started going on about all the index options he had, Binary, Hash, B-Tree, blah blah blah. I was quick to point out we also had XML indexes, but he wasn’t impressed.
I never got the big deal. B-Tree always seemed sufficient to me…. But, enter In-memory OLTP tables. These tables are not stored normally in memory. There is no clustered index. Architecturally it’s a whole different storage structure going on up there in RAM. To compliment this new in-memory table structure SQL Server finally added what my friend had tried to make me so inadequate about all those years ago… an alternate index style. The HASH index.
If you know how a B-Tree works you know that to locate a single value you must traverse the tree. You have to start at the root node, move through non leaf nodes, and finally arrive at your value, or values of interest. Typically this is just a couple of hops and far better that a full database scan – so along the way most of us, like me, have come to think this was as good as it gets.
A Hash index doesn’t work this way. Every index key has a pointer and the hash index jumps directly to it. Any lookup performed, no mater the size of the table is single hop, no traversing a tree. (pssst, that’s also why you need to have more hash buckets than projected key entries… ).
I remember taking a practice test for the MCSE and it asked me how many hash buckets I thought a specific scenario would need. My options were, 0, 1, 2, or 5 Million. I of course thought 2 was plenty. Shows what I know, ahem, or knew. Better keep studying.
As a side note, if more than one row is identified by a single hash, this is called a collision. When collisions occur, SQL Server resorts to a scan to identify the row of interest. Meaning, too few hash buckets or frequent collisions can quickly kill the benefit a hash index offers.
The down side however is this. Hash indexes have no order. Hash indexes are good for looking up one entry at a time and one entry only. That’s important. Hash indexes have no concept of the next closest entry. So, returning a range of values, an inequality, or asking return results to be “in order” is very costly on a table with only a Hash index. The resulting query plan will demonstrate that. On the flip side, doing a single value lookup… is really fast. Faster than a B-Tree by a non trivial amount – especially in large numbers.
So, as was happening a lot in the project I detailed last month, if you are doing a lot of lookups, like we were as part of our ETL, and the return is a single value, and the source is an In-Memory table… try a Hash index as well as an in-memory staging layer. Super fast. That’s what I learned. Now you know too.