Work with the same XML document you investigated in XPath Exercise 1: si-2020-10.xml. (This is the file containing a Site Index of Named Entities in the Digital Mitford Archive). Open the file in oXygen (and again, don’t be concerned about the schema warnings on the file). Work with the XPath Window set to version 3.1. Respond to the XPath questions below in a text or markdown file, and upload to Canvas for this assignment when you’re finished. (Please use an attachment! If you paste your answer into the text box, canvas may munch the code formatting.) Some of these tasks are thought-provoking, and even difficult. If you get stuck, do the best you can, and if you can’t get a working answer, give the answers you tried and explain where they failed to get the results you wanted. Sometimes doing that will help you figure out what’s wrong, and even when it doesn’t, it will help us identify the difficult moments. These tasks involve the use of path expressions and predicates, as well as the XPath function, count(), and there may be more than one possible answer. Consult our introductory guide Follow the XPath! for help with constructing your expressions.

With the Site Index XML file open in oXygen and using the XPath 3.1 browser window in oXygen, construct XPath expressions that will do the following. Be sure to give the XPath expression you used in your answer, and don’t just report your results. This way, if the answer is incorrect, we can help explain what went wrong.


When representing historical people in this document, we have worked on encoding their occupations in a <occupation> element, and we defined a limited set of @type and @subtype attribute values to help associate people with related work. The following questions explore with XPath what we can learn from our occupation markup:

  1. What XPath returns all the values of the @type attribute on the <occupation> elements?
  2. Let's see if we can read that list of occupation @type values without duplicates. Apply the distinct-values() function to your XPath, and record your expression.
  3. Now let’s chain two functions together! How can you return a count() of those distinct-values()? Record your XPath expression.
  4. We can write XPath to identify people (pull records of <person> elements) based on their nested <occupation> elements and the attributes marked on those elements. You will need to write XPath expressions with predicates, and sometimes nested predicates to answer the following questions:
    1. Let’s first find all the occupations marked with the attribute name-value pair: type="artist". Write an XPath expression that returns all the <occupation type="artist">.
    2. Now, let’s find the full listings of the artists themselves: How would you return the <person> elements that contain nested <occupation type="artist"> markup?
    3. Who are the women artists listed in our site index? The @sex attribute on the <person> records "m", "f", or "u" recording conventional gender associations of the nineteenth century for male, female, or undetermined. Write an XPath expression that returns <person> elements when the @sex value is "f" and the nested <occupation> is the @type value of "artist".
    4. The @subtype attribute on the <occupation> element holds more specific occupation information. Write an XPath expression that finds all of the <person> elements with an occupation @subtype of "engraver".
    5. Study how the birth and death dates are stored in the person entries. Sometimes when the specific birth or death date is unknown, we have simply encoded a year value. Build on your previous XPath expression to locate the one person in the site index who was an engraver born in the year 1787. Who was it?
    6. What XPath would return the birth dates of all the persons with occupation @subtype of "engraver" in the file?
    7. Use the simple map ! operator to return the string value of the birth dates you located. Now, send all those dates through the XPath sort() function to sort them from earliest to latest. And let's add one more function to the chain: What happens when you add min() to the end? What is the earliest year in which an engraver listed in our file was born?
  5. Explain why the following two XPath expressions return different results. Run each XPath expression, review the results, and explain what you think each expression is returning.