XQuery Exercise with KML

Using XQuery and XSLT Together to Produce KML for Mapping

image of planet Earth surrounded by satellite debris, made viewable by space technology.

Our projects often require us to look up information to add to our markup, and sometimes we want to do that in bulk for a long list of items. Instead of looking up just one word in a dictionary, what if we need to find 200 words? The internet makes this possible for us, and using the XQuery and XSLT we are learning, we can output lists formatted for lookup in an Application Programming Interface, or API, which retrieves the information we want and returns it to us in a new file. API technology is designed to help us as project developers aggregate information quickly! We can write XSLT to adjust the markup or output special forms of XML, HTML, or other "ML" markup formats that we need. One common application for this is mapping, when we have marked place names in our project XML and want to plot them on a map, but are missing the mapping coordinates (latitude and longitude) and need a special format for mapping technology. In the XML family of languages, KML is specially designed for display in mapping software such as Google Earth or Google My Maps, NASA WorldWind, and ESRI ArcGIS Explorer. This exercise gives you experience with API technology to look up geospatial data that you extract from XML, and then plot that data, together with information from your XML, on Google Maps using KML.

What is KML?

KML is KML (Keyhole Markup Language), a form XML designed to store and process location information with geospatial coordinates for visualization. Its odd name, “Keyhole,” is an homage to the early 1960s Key Hole spy satellites (so named because they were being used to see into secret areas during the Cold War, like peering furtively through a keyhole into a private room). Our view of the Google Earth globe and the extensive zooming and panning visibility available in Google Maps is reliant on satellite imaging technology that allows us to pan back from a distance to view continents or pull up close to a 3-dimensional street view.

With this exercise, you will gain experience with extracting data from multiple files, and with processing it in multiple stages of transformation, a kind of “pipeline” or chain of files we’ll create in both XQuery and XSLT. You’ll also gain experience with producing KML to view in Google Maps or Google Earth.

Anatomy of a KML file:

A KML file has the following basic structure. (There are many more elements that you may use, but for our purposes this is just a simple form of the file):

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark>
<name>McKenna Hall, University of Pittsburgh at Greensburg</name>
<description>where digital humanities students meet for class</description>
<Point>
<coordinates>40.275598,-79.532951</coordinates>
</Point>
</Placemark>
</kml>

The coordinates in the above mark latitude followed by longitude, and they are sometimes followed by a third unit (which we did not list here) indicating elevation above (or below) sea level. The latitude and longitude coordinate system works like a wrapped grid around the planet and serves as the basis for mapping technologies, permitting us to drop pins on georeferenced images based on latitude and longitude numbers. Here is a quick orientation to latitude vs. longitude..

As a form of XML, KML offers a way to store information that can be read by map visualization platforms. You can view a very basic KML file in <oXygen/> when you open a new KML file (look in New → Framework Templates), and we will be processing KML in this exercise just as we do other kinds of XML files. Notice that its namespace is <kml xmlns="http://www.opengis.net/kml/2.2"> . That namespace represents KML as an international standard of the Open Geospatial Consortium, that is, an open standard for geodata storage and retrieval.

Overview of the assignment:

This exercise contains lots of small interlocking parts, so it will help you to read through this overview first before you begin working!

First, we need to extract the distinct locations tagged in a batch of XML project files. We have uploaded for this purpose a small subset of the XML files from the 2015 Church Schism student project into our eXist-db on newtFire, accessible from collection('/db/Schism')/*. The student project investigated news stories about a major conflict dividing Episcopalian churches and kept track of locations mentioned in the stories. We want to help plot those locations with some information about each one on a map.

To begin, we will generate a plain-text list of the names, check them for errors that we can edit, and feed them into a map API which will return us a KML file containing their geo-coordinates. To create the file in the format we need to give the API, we will extract all the distinct values of the element <location> in use in these files, and we need to generate our output in a string-joined list with a line-break separator, so each name appears on a separate line in our plain text output. You will need the string-join() function, and use the using a special ASCII character code for a line feed: 
. To retrieve the location names from this file, we found it helpful to set up the XQuery to save the output into eXist-db, and then we downloaded it from the eXist directory.

We need the data in this format to input into a web form at GPS VIsualizer: http://www.gpsvisualizer.com/geocoder/, which will look up and output latitude and longitude coordinate data for each line of text. You will need to use a map API key to process more than five locations at a time, and for our class purposes we have set up a key to Bing Maps API that you may use (copy the full line below):

5IOF5qZGB0AuA1ddG44IsOL1anHPfb7I

The keys to Bing, Open Maps, and Google Maps permit individuals to process info for some hundreds or perhaps a thousand places per day, and are a sort of filter to prevent abuse and overload of map APIs. You may wish to sign up for your own Basic key for use in your projects.

After you paste in the Bing Maps key, select “Bing Maps" as the source, and be sure “raw list, one address per line” is selected. Then paste your list of places into the window, and click the green “Start geocoding” button. Watch the output results! There are a few different ways to save your results here: just make sure you are retrieving a KML or a KMZ file. (KMZ is a zipped archive format with KML inside, and you can open that in oXygen.) See the “Help with the Details” section below for guidance on how to save the output KML.

Download the KML file from the GPS Visualizer site, and save it to work with it. We will now make some changes to it so that it includes more information that we will extract via XQuery from the Church Schism project files. For this assignment, we would like to capture the text of the longest paragraph surrounding each location to hold in the placemarks of our KML, so that this text appears in a pop-up window when we click the location pins on our map.

To generate the locations with their associated longest paragraph, we return to XQuery to output a file we will call Placeography.kml. This will follow a very simple structure of a KML document: just location names and descriptions. so you will output the distinct-values of <location> again, followed by the longest paragraph in the project files that holds this location information. (We can’t easily accommodate all the paragraphs that hold this place name, so for our purposes, we’ll simplify matters and just return only the longest paragraph that introduces the location. (Hint: That is the paragraph with the maximum string-length containing the location.) It will have the following structure (including sample output here):

<kml>
<Placemark>
<name>Allegheny County</name>
<description>An Allegheny County court awarded the Episcopal Diocese of Pittsburgh more than $15 million in endowments, bank accounts and other assets that a secessionist diocese had sought to retain. of the Court of Common Pleas in Allegheny County ruled yesterday that the assets -- although not necessarily buildings and land titled in the name of the parishes that seceded -- belong to the Diocese of Pittsburgh of the Church of the United States of America. Due to the litigation, the financial services firm froze diocesan trust funds pending a decision by an Allegheny County Common Pleas judge. The 600-member Church of the Ascension was denied more than $30,000 in promised grants, most of which were intended to start a mission church. </description>
</Placemark>
</kml>

For the purposes of this homework exercise, when you output this KML file, name it Placeography.kml just for ease of reference to this assignment page. In your projects you may want to handle your data extraction differently than we do here (not necessarily outputting whole paragraphs, but perhaps other kinds of information correlated to the locations you want to plot on a map). In any case, you will want to think about what kinds of information you want to feature in the KML description element.

Now, we turn to <oXygen/> to write XSLT. We have some automatically generated KML that contains information we need to plot our maps, and we have a simple Placeography.kml. We are going to write an identity transformation stylesheet that will braid the two files together, to pull the GPS Visualizer KML data into our project-generated KML. We will show you how to do that below.

We output our KML file from this identity transformation, save it and open it in <oXygen/> to be sure it is well-formed and valid. Then we input it into either Google Earth (if you have installed it on your computer or access it on our computer lab machines), or into Google Maps (for which you will need to access your Google account). To import a KML file into Google Earth, follow these instructions. Here is a view of some old sample output from this assignment: something like what this should look like when you’re done.

Help with the details:

The first task, to output plain text with distinct location names simply involves using distinct-values() in a way that you’ve done before. (To be sure that you are getting the correct results, if you do a count() of all the distinct location names coded as of now in the Schism project files, the number should be 46.) We want to output a new location on each line of plain text output, using the string-join() function. Usually when we use string-join(), we indicate a comma or some form of punctuation as our separator character for the second argument of the function. This time, we’re using a special ASCII character code for a line feed: 
 . Position this in your string-join() (using quotation marks) just as you would any other separator character. (Or you could define it as a variable in XQuery, and then when you call the variable inside string-join(), don’t use the quotation marks.)

NOTE: In order to save the output file in eXist-db, we need to set up the string-join() function so that we started it before the first FLWOR statement, and end it so that its arguments were in the return.

Copy this XQuery into a text file to save, as one of the files you will upload to Courseweb for this homework assignment.

For the second task, copy the list of names from your output text file, and paste your list of location names into the GPS Visualizer tool (after inputting the Bing Maps Key), be sure you have selected Bing Maps as your source (so you can use my key) and click the green “Start geocoding” button. Wait for the results to come out in the text window: You will see lists of latitude and longitude pairs followed by place names and other information.

We would like to output this information as a KML document that we can transform for ourselves in <oXygen/>. To do that, change the output format to the right of the results box from “Google Maps” to “KML (G. Earth)”. Then, click on the link labelled in green in square brackets: [more map options].

This brings up a page titled “Convert your GPS data for use in Google Earth.” In the “General map parameters” area, change the Output file type on the dropdown menu from the default .kmz (zipped) to the form we want: .kml (uncompressed). Notice that you can make adjustments here to your output data. The KML file you create will be something you can import into Google Earth or Google Maps or other mapping software if you would like to see what you have produced so far. As you learn more about how to fine-tune your KML output and control the look of your maps in mapping software, you may find you want to make some adjustments on this screen to your output, or code other things yourself working in <oXygen/>. (Feel free to tinker with the default icon and icon color, etc.)
When you are ready, click the button in the lower right of your screen: “Create KML file.” You will see a file is generated with a long name (beginning with today’s date) and a .kml extension. Click to download and save the file in a folder into which you will save other files related to this mapping assignment. (You will need to save your XSLT in this folder, as well as one other KML file you are about to produce. It is important that all of these files are sitting in the same directory in relation to each other, for us to process them together.)

You have produced one KML file, or rather you have let the GPS Visualizer do it for you. Now, we want to create a very simple KML file of our own. Go back into eXist-db, and write a new XQuery file (building on the previous XQuery that you wrote, if you like). This time, we need to output a file that contains KML’s structure of <placeMark> elements, with two elements inside: a <name> element and a <description> element. You have learned to write XQuery to output HTML, with a root element at the top and bottom, and curly braces { } to set off the XQuery FLWOR statements and their itemized output. This time, you need to produce kml output, which is much simpler than HTML. Create a root element at the top of your file: <kml>, and close it at the end of your file: </kml>. In between, set up your curly braces { }, and write a FLWOR with a return statement in this form:

<kml>
{
(: FLWOR statements :)
return
<Placemark>
<name>{ }</name>
<description>{} </description>
</Placemark>
}
</kml>

Your FLWOR statement will need to output, in the kml name element, each one of the distinct-values() of the location element (using that range variable you worked with in the previous XQuery Exercise 2).
In the description element you need to define a variable to look up the longest ancestor paragraph that holds each location (coded as <p> in the XML files of the Schism collection). Return the string() value of that paragraph so we don’t bother with outputting its internal tags: we just want its plain text.
When you are generating output that makes sense, do the following:
- Write your XQuery to save its output to a file (as you did earlier).
- Copy and paste your XQuery into a text file with something new in the title (like Beshero-Schism-pt2) and save it to upload to Courseweb as part of this homework assignment.
- Save your output KML file, and then download it from eXist-db. Save your new KML file (for the purposes of our assignment) as Placeography.kml in the same folder with the KML file you generated with the GPS Visualizer. Open it in <oXygen/>, and add its proper KML namespace to the root element, thus:
  <kml xmlns="http://www.opengis.net/kml/2.2">
  This is very important! Without that namespace line, you will not be able to process this KML with the one produced by the GPS Visualizer. You will notice that as soon as you add this namespace you will see validation errors flagged in your KML. That is okay, because we will simply use this file to feed data into the valid KML produced by the GPS Visualizer.

Now, you have made not one, but two KML files. We want to take that description element you generated with the longest p output and merge it with the KML file that you generated from the GPS Visualizer. We will do that using XSLT to write an Identity Transformation stylesheet. You will be transforming the GPS Visualizer KML into a new and slightly modified KML, that draws from your second KML file. This will introduce you to some coding in XSLT that you have not seen before, a kind of coding that resembles what we do with drawing on multiple files in XQuery. In XSLT, too, you can work with multiple files. Because this is new and unusual, we want to show you how it works, so we give you our code with comments: You can pull it in from the DHClass-Hub, and find it in the KML folder inside Class Examples, or access it directly to download from the web here: our KML to KML Identity Transformation XSLT file. Retrieve this file so you can open it in <oXygen/>. Here is a view of the code:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xpath-default-namespace="http://www.opengis.net/kml/2.2"
xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math"
xmlns="http://www.opengis.net/kml/2.2"
version="3.0">


<xsl:output method="xml" indent="yes"/>
<xsl:variable name="PO" select="document('Placeography.kml')" as="document-node()"/>


<xsl:mode on-no-match="shallow-copy"/>



<xsl:template match="Placemark">
<Placemark>
<xsl:apply-templates select="description"/>
<xsl:apply-templates select="name"/>
<xsl:apply-templates select="description/following-sibling::*"/>
</Placemark>
</xsl:template>

<xsl:template match="description">
<name><xsl:value-of select="."/></name>
</xsl:template>

<xsl:template match="name">
<description>
<xsl:value-of select="$PO//name[. = current()]/following-sibling::description"/>
</description>
</xsl:template>

</xsl:stylesheet>

Import your file into Google Earth or Google Maps as described in the overview above. Do you see the output? Can you click on the placeMarks to view the paragraphs you extracted from the project files? (Again, to import a KML file into Google Maps, follow these instructions. To view in Google Earth, simply open Google Earth and open the file within it.)

Upload your two XQuery files, and each of the KML files you generated (three of them) to Courseweb for this assignment. Voila!

For more on mapping with KML: