Russ Thomas – SQL Judo

The Art of SQL Server Database Administration, Development, and Career Skills for the Technically Minded

Your NoSQL Choice Needs More Relational Algebra

Whether you can admit it to yourself or not, data without relation or context is pointless.  Let’s say I have a database with 14 TB of integers between 0 and 100, with the occasional outlier of numbers as high as 206.  Question for you, is the number 80 good?  Is it bad?  Am I happy when I find it more often, or less often.  If my average number of values was greater or less than 80, do people live happier, healthier lives?

If those numbers come in faster and faster and change more often than they used to, I suddenly have BIG DATA but does labeling it that suddenly give you any clue as to whether or not 80 is desirable?

relational_algebraI’m a relational guy by nature, I am interested in what purpose my data serves.  Conversely, I really HATE the fact that buzzwords have no relation to solutions.

That’s like you saying my database above, with no more context than I’ve given you, should always have numbers less than 40.

At least BIG DATA, another buzzword of our time, has some contextual relation to the four Vs (volume, variety, velocity, and veracity).  It’s still a buzzword, but at least it provides some meaningful context if you remember what it meant back before the marketing people got a hold of it.

The NoSQL movement really irks me, not because  I don’t think databases like MemcacheDB or DocumentDB or Riak are pointless.  I actually think they have a very critical place in data storage and make an extremely strong argument for usage – IN SPECIFIC USE CASES.

My problem with the NoSQL movement, as I finally clued into reading a Steve Jones post this morning, is that the entire movement has very little context or relation.  Up until now it’s been driven completely by people who have a hard time conceptualizing joins and feeling like this label means they don’t have to anymore.

Saying I am a proponent of NoSQL is literally saying, I prefer storing my data in anyway possible that doesn’t require me to have to use the SQL language to query it.  Beyond that, I don’t really care how it’s stored.  Or, in short, I am someone who shouldn’t be trusted with your data.  It’s like answering the question, “what would you like to drink?”, with the answer, “I don’t like milk”.  Awesome, here is a big glass of cough syrup, enjoy!

If you were choosing a database it would go something like this:

-- first, what are my choices?
SELECT  Category
FROM    DatabaseCategory;

/* results could include any of the following:
Key Value
Tuple Store
Object
Tabular
Document Store
Wide Column Store
Graph
Multi-Value
Relational
Multi-Model
*/

-- just kidding, I don't actually care, as long as it's NoSQL

SELECT TOP 1
        DatabaseName
FROM    DatabaseList
WHERE   CatKey NOT IN ( SELECT  CatKey
                        FROM    DatabaseCategory
                        WHERE   [Type] = 'Relational' );

Next time someone asks you how data should be stored, provide them an opportunity to define context and relation before making a decision.

It might look something more like this:

/* Our application is going to use small schema on demand
in the form of JSON, but it's critical that we maintain
ACID compliance due to nature of decisions those documents drive */

SELECT  DatabaseName 
FROM    DatabaseList DL
        INNER JOIN DatabaseCategory DC ON DL.CatKey = DC.CatKey
WHERE   ACIDCompliant = 1
        AND Category = 'Document';

I may still have some choices to make but at least I have relational context between my solution and my problem.

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Information

This entry was posted on December 1, 2015 by in Career Skills.
%d bloggers like this: