First of all, download the XML file I have linked here: si-2020-10.xml. (This is a file containing a Site Index of Named Entities in the Digital Mitford Archive). Open the file in oXygen (and don’t be concerned about the schema warnings on the file). Work with the XPath Window set to version 3.1. Respond to the XPath questions below in a text or markdown file, and upload to Canvas for this assignment when you’re finished. (Please use an attachment! If you paste your answer into the text box, canvas may munch the code formatting.) Some of these tasks are thought-provoking, and even difficult. If you get stuck, do the best you can, and if you can’t get a working answer, give the answers you tried and explain where they failed to get the results you wanted. Sometimes doing that will help you figure out what’s wrong, and even when it doesn’t, it will help us identify the difficult moments. These tasks involve the use of path expressions and predicates, as well as the XPath function, count(), and there may be more than one possible answer. Consult our introductory guide Follow the XPath! for help with constructing your expressions.

With the Site Index XML file open in oXygen and using the XPath 3.1 browser window in oXygen, construct XPath expressions that will do the following. Be sure to give the XPath expression you used in your answer, and don’t just report your results. This way, if the answer is incorrect, we can help explain what went wrong.

  1. This Site Index file organizes lists of proper names of various kinds. Take a look at the outline view of the document before you begin to familiarize yourself with the structure of this file, and then work with XPath to answer the following:
    1. What XPath expression helps you to see all of the <div> elements in the document? (How many are there?)
    2. Lists of persons are coded in <listPerson> elements. What XPath expression shows you all the <listPerson> elements in the document?
    3. What XPath expression shows you which <div> elements contain child <listPerson> elements? (Use a predicate filter with square brackets [ ] to help you.) How many <div> elements contain <listPerson> elements inside?
    4. How can you change your XPath expression to return <div> elements that contain <listPlace> elements inside?
    5. Now, write an XPath to return all the <place> children of the <listPlace> elements. How many are there?
    6. What is the difference between these two XPath expressions?
      //place/placeName
      //place//placeName
      Enter the two and inspect the results. Why does the second expression return a larger number of results than the first?
  2. When exploring a document with XPath, sometimes we are trying to find out what elements have a certain value or property. If we want to return an element in a certain position without knowing its name, we can just designate any element with element() or *. So, for example, //* returns all 32,711 elements in this document. (Try it and see.) Using this information, answer the following:
    1. What XPath shows you all of the immediate children (whatever they are) of <div> elements?
    2. There is a list of animals in this document coded in <list sortKey="animals">. How can you return all the child elements of this particular list to each of the animals? (How many are there?)
    3. Write a single XPath expression that returns all the different elements that hold a @sortKey attribute. (Our answer uses a predicate filter [ ].)
  3. This set of questions explores the <person> elements in the file.
    1. First, write an XPath expression that returns all the <person> elements. How many are there?
    2. Are all of the <person> elements coded with an @sex attribute? Use a predicate filter with [ ] to find out and record your expression here. How many results do you see?
    3. Apply a count() function to your previous expressions to return just a number in the XPath window
    4. XPath can work like a calculator: It can handle simple arithmetic operations like add, subtract, multiply (with an *) , and divide (with the word div). Try writing an expression that returns the count() of person elements coded with @sex attributes divided by the count() of all the person elements. Multiply that by 100 to see a percentage: About what percentage of person elements are coded with @sex attributes in this document?
    5. Write an XPath expression to find the number of <person> elements coded as female with @sex="f".
    6. Use the count() function and division in XPath once again to find out the proportion of persons coded female among all the persons coded with @sex attributes.