The Art of SQL Server Database Administration, Development, and Career Skills for the Technically Minded
“I don’t always query XML, but when I do; I drink dos equis” – Dan Sorensen.
This is the first tutorial in a complete series on XPath/XQuery. The master post for this series can be found here.
Whether you like it or not, XML is everywhere in SQL Server. It isn’t a matter of if you’ll have to deal with XML as a DBA but when. Take a quick look at the sys.dm_os_ring_buffers. This little gem stores a wealth of knowledge about what’s going on with your server’s memory. If you run a select statement against that dmv however you will find that all the really helpful information is stored within an XML document. XML also pops up in query plans, deadlock graphs, extended events, and a growing number of other areas within SQL Server critical to DBAs.
Grant Fritchey wrote an excellent article last year on navigating the XML message provided by the ring buffer dmv. After you complete my walk-through I challenge you to use XPath to modify, extend, and build upon what he wrote. The key to learning something is using it in practice.
Navigating XML with XPath
Even though some people mistakenly use the two terms interchangeably XPath is actually a subset of XQuery. XQuery implements XPath; but each remain their own standard. Further, Microsoft does not implement 100% of either standard in SQL Server. The XPath and XQuery that you’ll find in SQL Server is a partial set of features defined in the XQuery 1.0 and XPath 2.0 W3C standards. (see link for full MS support document)
XPath is used for navigation of an XML document and can return a sequence of values contained within the element(s) identified by the path. XQuery can leverage XPath but also provides much more functionality of its own such as looping and re-ordering. It’s important to know the difference because while not as powerful, XPath alone is much more performant. I’ll tackle the larger topic of XQuery in my next post.
I’m a fantasy baseball geek so I wrote a simple baseball related XML document to use as an example. Thorough explanations abound on the web if you need a refresher on what makes a document valid XML.
DECLARE @x XML SET @x = '<fantasy> <player Position="3B"> <name>Evan Longoria</name> <team>Rays</team> <slg>512</slg> </player> <player Position="OF"> <name>Carolos Gonzales</name> <team>Rockies</team> <slg>530</slg> </player> <player Position="C"> <name>Buster Posey</name> <team>Giants</team> <slg>486</slg> </player> </fantasy>'; SELECT @x.query('fantasy/player/team')
@x is a variable of XML data type that has been set to contain my sample document. The select statement is using the query method of @x which is one of the various methods available to the XML data type within TSQL. The string ‘fantasy/player/team’ being passed to the query method is a simple XPath statement. Run the query and you get:
Steps and Node Tests
It’s usually easiest to read an XPath statement right to left. Stepping through the statement above we are saying:
It reads much like a folder system or a URI. If you do read it right to left just don’t forget that it must ultimately find it’s way back to the root node ( or at least a level one node if the xml doc isn’t well formed ).
In XPath lingo each section of a path statement divided by a forward slash is called a step. To dive further into XPath lingo, each of the steps above complete a node test.
Hint: specifying a leading slash is optional, the query will work either way. A trailing slash however will produce an error – it causes the query to think a step is missing.SELECT @x.query('fantasy/player/team') -- works SELECT @x.query('/fantasy/player/team') -- works SELECT @x.query('/fantasy/player/team/') -- error
Msg 9341, Level 16, State 1, Line 20
XQuery [query()]: Syntax error near ‘<eof>’, expected a step expression.
It’s important to understand that XPath navigates the entire document. It will always return every element that matches the path provided. It will also always return them in the same order they are found in the document. This is something that throws relational folks off. Unlike relational data, ordinal position of elements matters in XML.
FYI: Attributes are a different story. In the pure XML realm the order of attributes is maintained. In the SQL Server world, when querying XML data types, SQL doesn’t guarantee order of attributes.
Let’s say you’re only interested in the one team that Buster Posey plays for. We need to filter out those elements that we are not interested in via a predicate. This predicate is placed after the path element to be filtered in brackets [ … ]. This predicate can be an ordinal position, an equality/inequality comparison, or a number of other options.
Since we know order matters to XML, let’s look at ordinal position first. If you know that Posey is contained in the third player element in the document; it’s completely valid to pass an XPath statement like this:
There is only one team element that meets this path:
Another predicate you could use to get Buster Posey’s team is to filter the player element by its attribute value. SQL folks get thrown off by the @ sign. In TSQL this is a variable designator, but in XPath it’s shorthand for an attribute.
The result of this query is the same as above. <team>Giants</team>. However we arrived at this result by taking a different path.
This is a good time to point out that XPath is case-sensitive. Position and position are two different words. One does not equal the other. Anytime you get unexpected results from XPath, check case.
Now, let’s say you don’t want the team element. You want the text contained inside of the team element. You can do that with the text() function provided as part of the XPath standard. Let’s try that previous query again, this time with the text() function.
The result now is just: Giants
Data Type and the value() Method
When you run the above query in SSMS you should notice that it is underlined in blue just like the xml snippets returned by the previous queries. Even though the return result is a single string with no markup, it is still of type XML.
If you want the result returned as another value you could cast it, but SQL Server provides a better way. The value method of the XML data type. Unlike the query method that always returns a sequence, to use the value method, you must construct your XPath in a way to satisfy SQL Server that the data returned is singleton. The value method also requires a second parameter that allows you to specify the data type of the value to be returned.
SELECT @x.value( '(/fantasy/player[@Position="C"]/team/text())' ,'varchar(50)') AS TeamName
To satisfy the singleton requirement I enclose the xpath statement in ( ) and then add a predicate that says I only want ordinal position . This is roughly the equivalent of using a TOP 1 designator in a Select Statement.
It’s also important to note that the value method is the only way to obtain the value of an attribute. To obtain the position of the 1st player for example:-- this code works SELECT @x.value('(fantasy/player/@Position)','varchar(5)') -- this code produces an error SELECT @x.query('fantasy/player/@Position')
Msg 2396, Level 16, State 1, Line 27
XQuery [query()]: Attribute may not appear outside of an element
The AXIS ( of evil ?? )
Hopefully up until now this has all been pretty straight-forward. Unfortunately this is where things get tricky; also where most people give up on XPath. Let’s say you want the team that Posey plays for but you do not know the ordinal position of the player element nor do you know any unique player attribute values as in the previous examples.
You should understand XPath enough at this point to be able to navigate to the player name element using a predicate of text()=”Buster Posey”. The question is; how do you get from there to the sibling team element under the same player element? XPath provides an option to navigate a path and then set a reference point known as an axis, and then navigate elsewhere using that point as a reference.
SELECT @x.query(' /fantasy/player/name[text()="Buster Posey"]/../team/text() ')
The query above uses the axis .. which is shorthand for the parent axis. XPath designates a variety of axis options including references for sibling, ancestor, descendant etc. Check the XPath standards document for all axis designators supported by Microsoft in SQL Server.
Stepping through from right to left
The strength of XML lies in its support of semi-structured data, flexibility, and cross platform support, not its ease of navigation for queries. I doubt this post has made you a fan of XPath. I do hope that I’ve at least cleared up the concept a little bit. We’ll build from here in the next post. You can also return to the master post of this series.