The Art of SQL Server Database Administration, Development, and Career Skills for the Technically Minded
Whether you can admit it to yourself or not, data without relation or context is pointless. Let’s say I have a database with 14 TB of integers between 0 and 100, with the occasional outlier of numbers as high as 206. Question for you, is the number 80 good? Is it bad? Am I happy when I find it more often, or less often. If my average number of values was greater or less than 80, do people live happier, healthier lives?
If those numbers come in faster and faster and change more often than they used to, I suddenly have BIG DATA but does labeling it that suddenly give you any clue as to whether or not 80 is desirable?
I’m a relational guy by nature, I am interested in what purpose my data serves. Conversely, I really HATE the fact that buzzwords have no relation to solutions.
That’s like you saying my database above, with no more context than I’ve given you, should always have numbers less than 40.
At least BIG DATA, another buzzword of our time, has some contextual relation to the four Vs (volume, variety, velocity, and veracity). It’s still a buzzword, but at least it provides some meaningful context if you remember what it meant back before the marketing people got a hold of it.
The NoSQL movement really irks me, not because I don’t think databases like MemcacheDB or DocumentDB or Riak are pointless. I actually think they have a very critical place in data storage and make an extremely strong argument for usage – IN SPECIFIC USE CASES.
My problem with the NoSQL movement, as I finally clued into reading a Steve Jones post this morning, is that the entire movement has very little context or relation. Up until now it’s been driven completely by people who have a hard time conceptualizing joins and feeling like this label means they don’t have to anymore.
Saying I am a proponent of NoSQL is literally saying, I prefer storing my data in anyway possible that doesn’t require me to have to use the SQL language to query it. Beyond that, I don’t really care how it’s stored. Or, in short, I am someone who shouldn’t be trusted with your data. It’s like answering the question, “what would you like to drink?”, with the answer, “I don’t like milk”. Awesome, here is a big glass of cough syrup, enjoy!
If you were choosing a database it would go something like this:
-- first, what are my choices? SELECT Category FROM DatabaseCategory; /* results could include any of the following: Key Value Tuple Store Object Tabular Document Store Wide Column Store Graph Multi-Value Relational Multi-Model */ -- just kidding, I don't actually care, as long as it's NoSQL SELECT TOP 1 DatabaseName FROM DatabaseList WHERE CatKey NOT IN ( SELECT CatKey FROM DatabaseCategory WHERE [Type] = 'Relational' );
Next time someone asks you how data should be stored, provide them an opportunity to define context and relation before making a decision.
It might look something more like this:
/* Our application is going to use small schema on demand in the form of JSON, but it's critical that we maintain ACID compliance due to nature of decisions those documents drive */ SELECT DatabaseName FROM DatabaseList DL INNER JOIN DatabaseCategory DC ON DL.CatKey = DC.CatKey WHERE ACIDCompliant = 1 AND Category = 'Document';
I may still have some choices to make but at least I have relational context between my solution and my problem.