What is XPath and How does it work?

Remember how we’ve described XML as like “nested boxes” or a “tree”? (Review that, if you like, on our page explaining XML “So What Exactly is XML anyway?”) Think of a whole XML file as a big box, and nested inside it are smaller boxes. If we think of XML as a tree, we can consider the whole file as springing from one root, and branching out in complex ways.

XPath is a language that we write to select parts (or nodes) of an XML document, so we can pick out the pieces, remix them, add to them, count them, number them, etc. We write what’s called an XPath expression so a computer can follow the path we lay out to to certain parts of an XML document. The way we express this needs to be very precise, though often there are multiple ways of creating a path or identifying a location in a document.

Though we sometimes “do XPath” for its own sake while we’re coding our files, to help us find particular things, or check or count something, most often we use XPath expressions within XML-transformation languages, including XQuery (XML Query Language) and XSLT (eXtensible Stylesheet Language Transformations). XPath is a “helper” or ancillary language that’s necessary to work with these transformation languages, which allow us to remix, rewrite, extract pieces from, and add pieces to XML files. Before you can write these transformations, you need to learn XPath.

XML is made of nodes

Think of our nested boxes in XML as made of nodes. A node is a position on the XML tree, and can be one of seven kinds:

  1. the document node: document('URL')
  2. the element node: element(). Remember, this contains the start and end tags and everything inside: from <element> . . .contents . .</element> .
  3. the attribute node: attribute(). This contains an attribute name and its value, as in item="beads" in the element <trade item="beads">.
    (Note, we’ll use the @ symbol to indicate an attribute value from now on.)
  4. the text node: text() which is the text content within an element. (We locate the text node in cases when we want to strip out elements and preserve only the text. This is not to be confused with the string() function which reaches deep into all nested elements to retrieve their text nodes.)
  5. the comment node: comment() to retrieve XML comments
  6. the namespace node: namespace() to retrieve only XML code in a specified namespace when these are mixed in a document containing, say, both HTML and KML.
  7. the processing-instruction node: processing-instruction(). This contains special processing code with syntax that starts with <? and ends with >

This link provides a handy picture of how element nodes and text nodes fit together in a "tree:" David Birnbaum’s XPath intro page on nodes.

We’ve shown you the Outline View in <oXygen/>, and it may be helpful to look at an outline of a file now, to explain how nodes relate to each other.

outline view in oXygen

Element and text nodes are described in XPath like a big family of ancestors, descendents, parents, children, and siblings.

In the Outline view of this HTML file which we took from <oXygen/>, we’ve marked multiple levels of a family relationship: The yellow squared root node <html> is the original ancestor of all the descendents inside. The red squared nodes (the <head> and <body> elements) are the immediate children of the <html> root node. They are also siblings to each other.

What’s the relationship of the <b> node (squared off in pink) to the <body> node? It’s a grandchild, or a descendant. And it’s the child of the <p> element. See how that works?

The elements I’ve squared off in blue may help us to visualize siblings, all the children of <body>. In XPath, we can distinguish these siblings in relation to each other. The second element squared in blue, the <p> element with the text “XPath”, is the "following-sibling" to the <p> element holding the text "Remember how we've described XML..." And we can say that the preceding-sibling to the <ol> element is the <p> element, which holds the text, “Think of our nested boxes in XML...”


Axes in XPath

What we’ve just been describing about family relationships is closely connected to axes in XPath. XPath’s default is to locate the child of the current context node (or the current position designated in the XML file). Basically, when you designate an axis in an XPath expression, you’re indicating the direction you want your path expression to go: Do you want it to look up to a parent or ancestor? Or down to a child or descendent?

ancestor:: The ancestor axis sends you to parents and above, all the way up to the root node.
parent:: or . . The parent axis sends you up a short distance, to the immediate parent of the context node.
child:: or / The child axis (which is the default) sends you down to the immediate child of the context node.
descendant:: or // The descendant axis sends you down to the children and their children, and their children’s children, etc.
preceding-sibling:: The preceding-sibling axis sends you to the left to the sequence of siblings that come before the context node (the big brothers and sisters, or earliest children of a parent).
following-sibling:: The following-sibling axis sends you to the right to the sequence of siblings that come after the context node (the little brothers and sisters, or younger children of a parent.
preceding:: The preceding:: axis sends you on a longer path than the preceding::sibling axis, as it looks for any nodes that precede the current context node in the document order, which includes the preceding siblings as well as the preceding nodes that are not siblings. We commonly use this axis (or its mirror-image, the following:: axis) when we are looking for preceding instances of a particular element that are not siblings, but nested inside nodes that appear earlier in a document, like looking for all the preceding paragraphs, or lines of a poem when these are nested inside earlier div elements or earlier line-groups. If you think about this axis in terms of the nodes on the XML tree, ancestors of an element are NOT on the preceding axis, even though they begin before the current context! That is because the ancestor node is still open around the current context node while you are invoking it. The ancestor of the current context doesn't precede it because it is currently containing it.
following:: The following:: axis is the mirror-image of the preceding:: axis, so the following:: axis sends you to all the nodes that follow the current context in the document order, which includes the following sibilings as well as the following nodes that are not siblings.
self:: or . There’s a self axis designating the current context node, which is useful sometimes when need to indicate the current location in a path.
attribute:: or @ One last axis that’s sort of in its own parallel universe: the attribute (@) axis! You can follow one of the paths up or down or left or right among elements, and if you want to locate attributes in particular, or want to locate only the elements with a particular attribute or attribute value, you move to the @ axis.

When we write path expressions, we indicate the axis, and we could indicate the name of an element we’re searching for, or another node specifier like the text node or text(). There’s a longhand form for specifying axes, and sometimes that longhand comes up in the XPath window in <oXygen/>, so it can be handy to know what these are and what they mean. Use whatever form of these makes the most sense to you as you’re writing your XPath expressions.

How to write an XPath expression

So you want to start writing an XPath expression already! Here we go. Use the XPath window in the upper left of your <oXygen/> screen, and select XPath 3.1 in the drop-down menu to its left. An XPath expression is a kind of journey, and it begins by figuring out where you are (what’s your context node?) and taking a step along an axis direction.

When we start writing XPath expressions, we usually begin from the document node at the top of the tree. (The root element is actually the child in XPath terms of that document node.) Many of our path expressions will begin with a double slash: //, and at the start of an XPath expression that always means, start at the document node and search through all XML nodes that descend from here (on the descendant:: axis all the way through the XML tree hierarchy. If we start our XPath expression with the double slash, we indicate we are beginning at the top-level document node and working our way down one or more levels to whatever node we indicate next:

//body/div/p

This expression means, start at the beginning node of the document and look for the body element anywhere below. Next, switch to the child:: axis, and take one step to find the immediate children of body–the sequence of div elements (however many there are). Then find all p elements that are the immediate children of all the div elements. XPath expressions are designed to return a sequence of results, whether that's a sequence of zero, one, or many items. If the div elements just do not have any child p elements, we will not see an error message from XPath. We will just return a sequence of zero results.

What if we want to return all the elements at a particular level, regardless of what they are? Here’s where we use the asterisk: *. Examples:

//body/div/*
//body/div//*

parent::*
self::*

Notice how these work: We say, in the first one, start from the document (root) node, and go down to find the body element, then all the div elements that are children of body, and then please return all the elements that are the immediate children of div. How’s the second one different from the first? Notice that we take a longer step down with that double slash: This says, with all those div elements that are children of body, please look down the descendent axis and return all the elements that are either the children or the descendents of div.

With the last two, notice we designate the axis first. The parent::* will return the name of the parent element for the current context node, whatever it is. (“Who’s my mommy?“) The last one I wrote for the sake of writing it: What if we were somehow lost in an XML file and just wanted to return the name and location of the current context node: (“Where am I??” or “Who am I?”). We do not often have to reference the self::* axis, but I just invoked it while typing inside the p element forming this paragraph, and returned the entire element node in which I am currently typing this.

Now, a single slash, /, typically indicates taking a step in a path expression, that is, “take one step down and find the immediate children of the context node.” When we’re writing XQuery and XSLT we’ll find ourselves writing XPath expressions from specific points inside a document (not necessarily from the root or document node up at the top), so in those cases we might actually begin an expression like this:

p/span

This means, starting from the current context node, look for the immediate children named p, and then go down and find all the immediate children of p named span.

So how do we work with attributes? They have their own @ axis. Here are some examples of how you can access the attribute axis and what you might find:

//@id

//div/@id

The parents are attributes are technically elements, so very frequently we climb down (or up) an axis to locate a parent element, and then shift over to the attribute axis with the /@, as we did in the second example here. Notice with the first example, though, that this actually starts from the document (root) node, and hunts through the whole document looking for all instances of the @id inside elements wherever it may turn up. If you download this html file that you’re reading, open it in <oXygen/> and run that XPath, click on the results in the bottom view window, and they’ll highlight just the attribute portions of their elements.

Predicates, the filters of XPath: [ ]

Predicates in XPath are filtering tools. When we write these, we want to keep in mind the structure of the document, and the direction that the computer processor “looks” as it’s following your XPath expression. David Birnbaum gives a good flowchart explanation of how this works on the “Predicates” section of his XPath site. When you write a predicate, you indicate how to select a specific thing out of a lineup, a particular paragraph based on its position, for example. You write a predicate using square brackets:

//div/p[1]

//div[@id="space"]/p[1]

So, say we have multiple div elements, each of which has multiple p children. The first expression returns, for each div in turn, its first p child. In my document, I have two divs with p children, so this XPath returns 2 results, the first paragraph only in each of these divs.

The second is a little more complex: Can you work it out? Here we use predicates in two steps of our XPath expression, to streamline results at each level: First, we say, look for ONLY the div element with the @id="space". We filter out all of the other divs which have different @id attribute values, then, and we ONLY look inside this one div that has this distinguishing characteristic. Then, we step down and find its first p child. Make sense?

Now, you can actually set multiple predicates [][] right next to each other in an XPath expression. Here’s an example of when you might need to do this: Say you have a file divided up into multiple sections using div elements. Let’s say that several of these div elements are designated with @type, as in <div type="poem"> to indicate a particular kind of content inside. You may have other div elements designated as @type="preface" and @type="acknowledgments". So this file is basically a “book” of poems with some accompanying material, and the div elements separate each poem from the others. Now, let’s say you want to find your way with XPath to the fifth poem in the file. Here’s how you’d do that by making two predicates side by side:

//div[@type="poem"][5]

See how this works? First you say, find me the div elements filtered by @type="poem", and then filter those results to give me just the fifth one in the sequence.

Functions: How to calculate and process information with XPath

Functions let us do some processing of the things we locate with XPath expressions. What if you don’t want to return all the p elements inside a div, and you just want to count them all? That’s when we use a function called count(). You put in parentheses the nodes you want to count, and this can work in lots of ways:

count(//@id)

//div/count(p)

count(//div/p)

//div[@id="space"]/count(p)

In the first of these, I’ve written an expression that says, "Count the number of times the @id attribute is used in this file, starting from the root (document) node. (When I run it here in <oXygen/> as I’m writing this document, the answer is 3 (as I see in the bottom window).

The second expression, //div/count(p), says, start from the root node, go down to ALL the div elements in the file, and then count the p elements in EACH div, one by one. My bottom view screen in <oXygen/> gives me a set of three results, with a count for each div in turn. That can be handy!

But what if we want to find out how many p elements there are total under ALL the divs? That’s what the third expression, count(//div/p) does: It gives just ONE result, and it’s a count of ALL the p elements in ALL the divs, without differentiating them div by div by div. Let’s think about why this gives a different result from the previous example: This has everything to do with where you place the count function! When you place the count at the end, and don’t include the div element in the parentheses, you effectively step down the tree to each div, and THEN do a count of the p elements inside each. By contrast, when you say count(//div/p), the computer postpones the counting until it’s found all the p’s everywhere that are children of divs. The count function says, “Hey! round up ALL the p’s that are children of divs, and count those.”

Can you figure out what the last expression is counting? //div[@id="space"]/count(p)

This combines a predicate with a function! So you indicate that you want to start with only one div, the one with the @id="space" . Then you count the number of p elements in that div.

Following the introduction of XPath 3.0 (and currently in XPath 3.1), we have an alternative syntax for applying functions using simple map and the arrow operator. We find these new notations a little easier to read and write, so we have prepared a short tutorial on how to use them on our DHClass-Hub and now use these notations in class.

Using Functions and Predicates Together: Some Complex XPath!

You can actually put functions inside predicates in XPath as a way to filter your results! Here’s an example: Say we want to find any and all div elements that have just one p child? We’d write that expression this way:

//div[count(p) = 1]

Here’s another example. What if want to return an div element that doesn’t have ANY p elements? Here we’d use a predicate again, and this time use a function called not()

//div[not(p)]

About these examples: Notice that when you put a function in an XPath predicate, you don’t have to indicate that you’re stepping down a tree to get the child element. The default expectation when we don’t give an axis is that we’re stepping down to the immediate child.

Quite often in writing XPath, we’re trying to find something in terms of its position. We’ve given you examples of predicates that return a specific numerical position, as in the first or fifth paragraph or div element, using [1] or [5]. But what if you don’t have an exact position, but know you want to return, say, the first 10 of a particular element? Here’s where we can use the position() function.

//div[@type="poem"][position() < 10]

This retrieves the first 10 <div type="poem"> elements in an XML collection of poems chunked in divs. The predicate expressions work to filter by grabbing all the appropriate “poem” divs whose position is 10 or under. Note that nothing goes inside the parentheses for position(). Context functions like this keep the parentheses open. (Basically, position() and last() are written differently from the mathematical functions like count(), where we indicate a particular thing to be counted inside the parentheses.)

Here’s another useful positional function: last()

As you might expect, last() returns the last of a thing you’ve designated. This is another positional function frequently used in predicates. So, where //p[1] returns the first <p> element in the document, //p[last()] returns the last, wherever it is, and it’s handy because we usually don’t know how many p elements (or paragraphs) there are in a big file! Again, note that nothing goes inside the parentheses for this positional function.

XPath has a little over 100 different functions to choose from, but in most project scenarios, you really only need to work with a handful, which you’ll get to know pretty well. (One of our favorites is distinct-values(), which we’ll show you. We use distinct-values to eliminate repetition in a list of results, so we get only the distinctly different things, like the names of elements we’ve used, for example. Don’t try to memorize these, but you’ll need to look them up periodically and find one to use. Here are some useful XPath function lists, with explanations of what the functions do and how they work:

Here is a handy quick review of terms and symbols. Check out the combination of predicates and functions on that page in the expression with string-length() just above the review: See if you can understand it.

Testing for comparisons with XPath

We often write XPath predicates to find out where a count() of something is no higher than X number, or where an attribute value in one position (say //sp[@who="#NelsonNell"] is equal to a value in another position (say //persName[@ref="#NelsonNell"] in our XML documents. To express conditions of comparison, where one value compares a certain way with another value, we use comparison operators. XPath can express two types of comparison: value comparison and general comparison. We reproduce here Obdurodon’s explanation of value comparison and general comparison with examples of each. The next sections on value comparison and general comparison are quoted from the indispensable resource, The Xpath functions we use the most.

Value comparison

The value comparison operators are:

Value comparison can be used only to compare exactly one item to exactly one other item. For example, to create a predicate that will filter <sp> elements to keep only those where the value of the associated @who attribute is equal to the string hamlet, we can write:

//sp[@who eq 'hamlet']

Since each <sp> has exactly one @who attribute and since we are comparing it to a single string, the test will return True or False for each <sp> in the document. Because the exactly one item can be an empty sequence (technically no items), the test will also work (and return False) when an <sp> element has no @who attribute. It is, however, an error if either side of the comparison contains a sequence of more than one item.

Value comparison is often used for numerical values. To keep all of the speeches (<sp> elements) with more than 8 line (<l>) descendants, we can write:

//sp[count(descendant::l) gt 8]

In the preceding example, the output of the count() function is a single item, an integer, and it is being compared to another single item, the integer value 8.

General comparison

The general comparison operators are:

While value comparison operators can compare only one thing on the left to one thing on the right, general comparison operators can have one or more items on either side of the comparison (also zero items, since the empty sequence is also allowed). For example:

//sp[@who = ('hamlet', 'ophelia')]

will retain all <sp> elements where the @who attribute is equal to either hamlet or ophelia. This makes general comparison a convenient alternative to a complex predicate like:

//sp[@who eq 'hamlet' or @who eq 'ophelia']

In comparisons with exactly one item on either side of the comparison operator, value comparison and general comparison are equivalent.

One possibly surprising feature of general comparison is the way it behaves with negation. Consider:

//sp[@who != ('hamlet', 'ophelia')]

This does not find all speeches by anyone other than Hamlet or Ophelia! It finds all speeches where the @who attribute is not equal to any one of the individual items in the sequence on the right. This means that it finds all speeches without exception, since the ones by Hamlet are not by Ophelia (the test succeeds because @who is not equal to ophelia in situations where it is equal to hamlet) and vice versa.

So how do you find all speeches by anyone other than Hamlet or Ophelia? Try:

//sp[not(@who = ('hamlet', 'ophelia'))]

The preceding predicate says that we want to keep all speeches where it is not the case that the @who attribute is equal to either hamlet or ophelia.

Summary of comparison operators

Description Value General
Equal to eq =
Not equal to ne !=
Greater than gt >
(&gt;)
Greater than or equal to
(not less than)
ge >=
(&gt;=)
Less than lt <
(&lt;)
Less than or equal to
(not greater than)
le <=
(&lt;=)

Resources for further reading

For more guidance on how to write XPath expressions with functions and predicates, please see: