The Fall 2020 DIGIT 400 James Bond project team has prepared XML for the screenplay Goldeneye, which you can access by right-clicking on the file and downloading it from here: Goldeneye.xml. Open the file in oXygen and work with the XPath Window set to version 3.1. Respond to the XPath questions below in a text or markdown file, and upload to Canvas for this assignment when you’re finished. (Please use an attachment! If you paste your answer into the text box, canvas may munch the code formatting.) Some of these tasks are thought-provoking, and even difficult. If you get stuck, do the best you can, and if you can’t get a working answer, give the answers you tried and explain where they failed to get the results you wanted. Sometimes doing that will help you figure out what’s wrong, and even when it doesn’t, it will help us identify the difficult moments.

You should consult The XPath Functions We Use Most page and especially its section 4 on Strings. As always, consult our class notes and our introductory guide Follow the XPath!. Be sure to give the XPath expression you used in your answer, and don’t just report your results. This way, if the answer is incorrect, we can help explain what went wrong.


First of all, skim through the document to get a sense of how it is coded. Write some XPath to see if you can write XPath expressions to find all the scenes, stage directions, speeches, and speakers just to warm up and familiarize yourself with the file.

  1. Let’s start by exploring the sd elements. These contain the stage directions.
    1. What XPath expression returns all the stage directions that contain the word "Russian"? How many are there?
    2. Some of the stage directions contain words emphasized in block caps. Write an XPath expression using the matches() function to locate the stage directions that hold a regular expression pattern of three or more capital letters in a row.
    3. There is usually a pretty important stage direction after a scene change. Every scene change comes with a Heading element. How can you reliably find the first stage direction immediately following that Heading element? (Hint: our solution uses the following-sibling:: axis and a position predicate to indicate the first in a sequence.)
    4. Of these these stage directions that come immediately following Heading elements, how can you find out which ones contain reference to the character "Q"? (Hint: add a predicate).
  2. This set of questions explores what you can find out with the XPath string-length() function, which indicates the number of characters in the XML node that you visit.
    1. Write an XPath expression that returns the string-length() of all the stage directions coded in sd elements.
    2. Now, send those results to the max() function to find out the longest length of a stage direction in the Goldeneye script.
    3. The string-length() and max() functions took us off the XML tree to yield calculated results. How can we write XPath to return the XML element sd that has the maximum string-length()? Hint: Try searching for sp elements with a predicate that checks to see if the string-length() is equal to the maximum string-length you found in the previous step.
    4. Carefully rewrite your previous expressions to return speech elements this time. What XPath expression returns the shortest speech in the screenplay, and what is said in that speech?
    1. Notice how spk elements are nested as children inside the sp elements. Write an XPath expression to return all the speakers (spk) who deliver speeches that contain the word "Iraq".
    2. All the spk elements are entered in block caps. Use the XPath lower-case() function to return all the spk elements lower-cased instead and record your expression.
    3. We don’t really want to make the speakers names all lower-case. We just want to lower-case the letters after the first one, to change BOND to Bond. We can do that kind of string-surgery in XPath by working with substrings. Consult this page to learn about the XPath substring() function and see how to write it out. Now, see if you can apply the substring() function to isolate the 2nd letter onward in the spk elements. Then, lower-case() that substring!
    4. Now, if you could apply the substring() to isolate letters 2 to the end, you should be able to change it to return only the very first letter. Try it and record your expression.
    5. One last challenge. If we can isolate part of the speakers' names to lower-case the 2nd letter to the end, we should be able to connect the first (capital) letter to the rest of the lower-cased letters. For this we want to use the XPath concat() function, and there is a convenient shorthand for it in XPath 3.1 which sets two vertical bars || between the expressions you want to connect. However, we need to be careful because concatenation requires joining exactly one thing to exactly one other thing. (XPath can't figure out on its own how to concat (or tie together) the whole sequence of substrings of the first letter to the whole sequence of the substrings of the rest.) To help XPath to work one at a time over sequences of spk substrings, look up the for $i in (sequence) return ... XPath sequence. (This is a for-loop in XPath, and $i is known as a range variable that isolates each member of the series, one by one.) With the for-loop, you can go one step at a time through the series of //spk nodes and return a concatenation of the substring functions you figured out, using $i as the first argument of your substring functions. See if you can work out how to write this XPath.