Russ Thomas – SQL Judo

The Art of SQL Server Database Administration, Development, and Career Skills for the Technically Minded

XQuery for the SQL Server DBA: Namespaces

Do you hate XQuery? You’re not alone… heck, I’m writing this blog post and I’m not that big a fan of it. Dr David DeWitt has said a couple times that he’s no fan (PASS keynote at 134:50).  In a recent podcast he said he thought Microsoft would be better off deprecating it (SqlDownUnder at 58:50) – for full context of statements listen to discussion re XML and JSON at 13:30 – very interesting stuff.

aggrivation

Reality

As a data type – I don’t think it’s going anywhere. Semi-structured data exists; and as a storage mechanism XML is the standard for storing it.  If it exists as a data type – users will need to query it – and that’s where the problem lies; getting that data back out.

You could argue “ok fine, it’s a data type but querying it belongs at the application level”.  Where does that leave the DBA?  Microsoft makes quite a bit of the information we use available as XML only.  So without a mechanism for querying it within SQL Server we’d be stuck.  For better or worse, to query XML right now our option is XQuery.

What do you as a DBA need to know?

I covered the basics of XPath in my last post.  From here on out I’ll tackle the larger topic of XQuery as a whole.  It’s a massive topic – no way possible to fit in a single post.  Heck, I could write a book about what I know and multiple volumes on what I don’t.  So, I’m going to break this up over a few posts.  The examples below assume that you have a basic understanding of XPath.  See previous post if you don’t.

Let’s get it started with Namespaces – what they are, and how they affect your query.


Namespaces

In my last post I used a basic XML document to represent hitters on my fantasy baseball team.  Let’s add a couple pitchers to the document.

<fantasy>
  <player Position="3B">
    <name>Evan Longoria</name>
    <team>Rays</team>
    <slg>512</slg>
    <k>702</k>
  </player>
  <player Position="OF">
    <name>Carolos Gonzales</name>
    <team>Rockies</team>
    <slg>530</slg>
    <k>624</k>
  </player>
  <player Position="P">
    <name>Justin Verlander</name>
    <team>Tigers</team>
    <era>3.41</era>
    <k>1671</k>
  </player>  
  <player Position="P">
    <name>Clayton Kershaw</name>
    <team>Dodgers</team>
    <era>2.60</era>
    <k>1206</k>
  </player>
</fantasy>

Pitchers share some elements with hitters but introduce their own stats such as <era>.  No problem. But what about the K stat?  For hitters it’s how many times they struck out.  For pitchers it’s how many strikeouts they threw.  For a batter; low numbers are good.  For a pitcher; the higher the better.  When querying, summing, analyzing Ks you wouldn’t want to mix this stat, it would throw off your numbers.

XML uses namespaces to resolve the ambiguity between elements with the same name.

You almost always see a Namespace as a URI, but despite W3C strong recommendation that it be a valid URI the only actual requirement for SQL is that it be a character string unique to the document.  With that said, I would still recommend URIs because they are guaranteed unique and can also reference other details such as schema definition, origin of document, documentation via URL.  Further, unless you wrote the XML document you don’t really have control over what namespace is – so best plan on it being a URI.

The XML document declares a namespace by placing xmlns: in the root element.  After the colon you specify a prefix that it will be known by xmlns:f1 and then set’s it equal to the unique URI.  In my example below I use a made up URL of mlb.com.

<f1:fantasy xmlns:f1="http://mlb.com/stats/hitters"
            xmlns:p1="http://mlb.com/stats/pitchers">
  <f1:player Position="3B">
    <f1:name>Evan Longoria</f1:name>
    <f1:team>Rays</f1:team>
    <f1:slg>512</f1:slg>
    <f1:k>702</f1:k>
  </f1:player>
  <f1:player Position="OF">
    <f1:name>Carolos Gonzales</f1:name>
    <f1:team>Rockies</f1:team>
    <f1:slg>530</f1:slg>
    <f1:k>624</f1:k>
  </f1:player>
  <p1:player Position="P">
    <p1:name>Justin Verlander</p1:name>
    <p1:team>Tigers</p1:team>
    <p1:era>3.41</p1:era>
    <p1:k>1671</p1:k>
  </p1:player>  
  <p1:player Position="P">
    <p1:name>Clayton Kershaw</p1:name>
    <p1:team>Dodgers</p1:team>
    <p1:era>2.60</p1:era>
    <p1:k>1206</p1:k>
  </p1:player>
</f1:fantasy>

Remind me why I care:

You care because some of the XML documents you might want to query as a DBA will contain namespaces.  The ShowPlanXML related to query plans within the plan cache for example.

SELECT TOP 10
        [text] AS query_statement ,
        query_plan
FROM    sys.dm_exec_cached_plans cp
        CROSS APPLY sys.dm_exec_sql_text(plan_handle)
        CROSS APPLY sys.dm_exec_query_plan(plan_handle)
WHERE   cacheobjtype = 'Compiled Plan'
        AND [text] NOT LIKE '%sys.dm_%';

namespace


Practice:

To begin working with namespaces I’ll give you a couple quick queries.

declare @x xml 
SET @x = '<f1:fantasy xmlns:f1="http://mlb.com/stats/hitters"
            xmlns:p1="http://mlb.com/stats/pitchers">
  <f1:player Position="3B">
    <f1:name>Evan Longoria</f1:name>
    <f1:team>Rays</f1:team>
    <f1:slg>512</f1:slg>
    <f1:k>702</f1:k>
  </f1:player>
  <f1:player Position="OF">
    <f1:name>Carolos Gonzales</f1:name>
    <f1:team>Rockies</f1:team>
    <f1:slg>530</f1:slg>
    <f1:k>624</f1:k>
  </f1:player>
  <p1:player Position="P">
    <p1:name>Justin Verlander</p1:name>
    <p1:team>Tigers</p1:team>
    <p1:era>3.41</p1:era>
    <p1:k>1671</p1:k>
  </p1:player>  
  <p1:player Position="P">
    <p1:name>Clayton Kershaw</p1:name>
    <p1:team>Dodgers</p1:team>
    <p1:era>2.60</p1:era>
    <p1:k>1206</p1:k>
  </p1:player>
</f1:fantasy>'

SELECT @x.query('

    declare namespace f1="http://mlb.com/stats/hitters";

    (/f1:fantasy/f1:player/f1:name)[2]

')

You will get:

xpath_ns

The query first declares a namespace matching the one in the document for hitters. The next line is an XPath statement very similar to those I demonstrated in my last post. The only difference is now I have to prefix them all with the namespace that they belong to. It adds syntax for sure – xml is nothing if it isn’t chatty – but you get the point.

Now pay attention – I’m going to throw you a curve.

Let’s run another query, this time against pitchers:

SELECT @x.query('

    declare namespace f1="http://mlb.com/stats/hitters";
    declare namespace Q1="http://mlb.com/stats/pitchers";

    (/f1:fantasy/Q1:player/Q1:name)[2]

')

You will get:

xpath_ns2

Did you notice that I declared my namespace as Q1 even though in the document it’s p1. That was on purpose. You see, the namespace is the URI – not it’s prefix. The prefix in the document doesn’t have to match the prefix you declare for the query – it’s just a placeholder.

Next Post – Sneak Peak

Comparing Kershaw strikeouts to Verlander’s strikeouts is only meaningful when you divide by total games pitched.  How else would you compare a 10 year vet against a 2nd year sophomore.  In my next post we’ll mix a little math into our XQuery as well as some other helpful nuggets. You can also return to the master post of this series.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on December 13, 2013 by in XPath XQuery XML.
%d bloggers like this: