XQuery Exercise with KML for Mapping
Using XQuery and XSLT Together to Produce KML for Mapping
image of planet Earth surrounded by satellite debris, made viewable by space technology.
Our projects often require us to look up information to add to our markup, and sometimes we want to do that in bulk
for a long list of items. Instead of looking up just one word in a dictionary, what if we need to find 200 words? The internet makes this possible for us, and using the XQuery and XSLT we are learning, we can output lists formatted for lookup in an Application Programming Interface, or API, which retrieves the information we want and returns it to us in a new file. API technology is designed to help us as project developers aggregate information quickly! We can write XSLT to adjust the markup or output special forms of XML, HTML, or other "ML" markup formats that we need. One common application for this is mapping, when we have marked place names in our project XML and want to plot them on a map, but are missing the mapping coordinates (latitude and longitude) and need a special format for mapping technology. In the XML family of languages, KML is specially designed for display in mapping software such as Google Earth or Google My Maps, NASA WorldWind, and ESRI ArcGIS Explorer. This exercise gives you experience with API technology to look up geospatial data that you extract from XML, and then plot that data, together with information from your XML, on Google Maps using KML.
What is KML?
KML is KML (Keyhole Markup Language), a form XML designed to store and process location information with geospatial coordinates for visualization. Its odd name, “Keyhole,” is an homage to the early 1960s Key Hole spy satellites (so named because they were being used to see into secret areas during the Cold War, like peering furtively through a keyhole into a private room). Our view of the Google Earth globe and the extensive zooming and panning visibility available in Google Maps is
reliant on satellite imaging technology that allows us to pan back from a distance
to view continents or pull up close to a 3-dimensional street view.
With this exercise, you will gain experience with extracting data from multiple files,
and with processing it in multiple stages of transformation, a kind of “pipeline”
or chain of files we’ll create in both XQuery and XSLT. You’ll also gain experience
with producing KML to view in Google Maps or Google Earth.
Anatomy of a KML file:
A KML file has the following basic structure. (There are many more elements that you
may use, but for our purposes this is just a simple form of the file):
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark>
<name>McKenna Hall, University of Pittsburgh at Greensburg</name>
<description>where digital humanities students meet for class</description>
<Point>
<coordinates>40.275598,-79.532951</coordinates>
</Point>
</Placemark>
</kml>
The coordinates in the above mark latitude followed by longitude, and they are sometimes followed by a third unit (which we did not list here) indicating elevation above (or below) sea level. The latitude and longitude coordinate system works like a wrapped grid around the planet and serves as the basis for mapping technologies, permitting us to drop pins on georeferenced images based on latitude and longitude numbers. Here is a quick orientation to latitude vs. longitude..
As a form of XML, KML offers a way to store information that can be read by map visualization platforms. You can view a very basic KML file in <oXygen/> when
you open a new KML file (look in New → Framework Templates), and we will be processing KML in this exercise just as we do other kinds of XML files. Notice that its namespace is <kml xmlns="http://www.opengis.net/kml/2.2"> . That namespace represents KML as an international standard of the Open Geospatial Consortium, that is, an open standard for geodata storage and retrieval.
Overview of the assignment:
This exercise contains lots of small interlocking parts, so it will help you to read through this overview first before you begin working!
- First, we need to extract the distinct locations tagged in a batch of XML project
files. We have uploaded for this purpose a small subset of the XML files from the 2015 Church Schism student project into our eXist-db on newtFire, accessible from collection('/db/Schism')/*. The student project investigated news stories about a major conflict dividing Episcopalian churches and kept track of locations mentioned in the stories. We want to help plot those locations with some information about each one on a map.
- To begin, we will generate a plain-text list of the names, check them for errors that we can edit, and feed them into a map API which will return us a KML file containing their geo-coordinates. To create the file in the format we need to give the API, we will extract all the distinct values of the element <location> in use in these files, and we need to generate our output in a string-joined list with a line-break separator, so each name appears on a separate line in our plain text output.
You will need the string-join() function, and use the using a special ASCII character
code for a line feed: . To retrieve the location names from this file, we found it helpful to set up the XQuery to save the output into eXist-db, and then we downloaded it from the eXist directory.
- We need the data in this format to input into a web form at GPS VIsualizer: http://www.gpsvisualizer.com/geocoder/, which will look up and output latitude and longitude coordinate data for each line
of text. You will need to use a map API key to process more than five locations at
a time, and for our class purposes we have set up a key to Bing Maps API that you
may use (copy the full line below):
5IOF5qZGB0AuA1ddG44IsOL1anHPfb7I
The keys to Bing, Open Maps, and Google Maps permit individuals to process info for
some hundreds or perhaps a thousand places per day, and are a sort of filter to prevent
abuse and overload of map APIs. You may wish to sign up for your own Basic key for
use in your projects.
After you paste in the Bing Maps key, select “Bing Maps" as the source, and be sure “raw list, one address per line” is selected. Then paste your list of places into the window, and click the green “Start geocoding” button. Watch the output results! There are a few different ways to save your results here: just make sure you are retrieving a KML or a KMZ file. (KMZ is a zipped archive format with KML inside, and you can open that in oXygen.) See the “Help with the Details” section below for guidance on how to save the output KML.
- Download the KML file from the GPS Visualizer site, and save it to work with it.
We will now make some changes to it so that it includes more information that we will
extract via XQuery from the Church Schism project files. For this assignment, we would
like to capture the text of the longest paragraph surrounding each location to hold in the placemarks of our KML, so that this text appears in a pop-up window when we click the location pins on our map.
- To generate the locations with their associated longest paragraph, we return to XQuery to output a file we will call Placeography.kml. This will follow a very simple structure of a KML document: just location names and
descriptions. so you will output the distinct-values of <location> again, followed by the longest paragraph in the project files that holds this location information. (We can’t easily accommodate
all the paragraphs that hold this place name, so for our purposes, we’ll simplify
matters and just return only the longest paragraph that introduces the location. (Hint: That is the paragraph with the maximum string-length containing the location.) It
will have the following structure (including sample output here):
<kml>
<Placemark>
<name>Allegheny County</name>
<description>An Allegheny County court awarded the Episcopal Diocese of Pittsburgh
more than $15 million in endowments, bank accounts and other assets that a secessionist
diocese had sought to retain.
of the Court of Common Pleas in Allegheny County ruled yesterday that the
assets -- although not necessarily buildings and land titled in the name of the parishes
that seceded -- belong to the Diocese of Pittsburgh of the Church of the United
States of America. Due to the litigation, the financial services firm froze diocesan
trust funds pending a decision by an Allegheny County Common Pleas judge. The 600-member
Church of the Ascension was denied more than $30,000 in promised grants, most of which
were intended to start a mission church.
</description>
</Placemark>
</kml>
For the purposes of this homework exercise, when you output this KML file, name it Placeography.kml just for ease of reference to this assignment page. In your projects you may want
to handle your data extraction differently than we do here (not necessarily outputting
whole paragraphs, but perhaps other kinds of information correlated to the locations
you want to plot on a map). In any case, you will want to think about what kinds of
information you want to feature in the KML description element.
- Now, we turn to <oXygen/> to write XSLT. We have some automatically generated KML
that contains information we need to plot our maps, and we have a simple Placeography.kml.
We are going to write an identity transformation stylesheet that will braid the two files together, to pull the GPS Visualizer KML data into our project-generated KML. We will show you how to do that below.
- We output our KML file from this identity transformation, save it and open it in <oXygen/>
to be sure it is well-formed and valid. Then we input it into either Google Earth
(if you have installed it on your computer or access it on our computer lab machines),
or into Google Maps (for which you will need to access your Google account). To import
a KML file into Google Earth, follow these instructions. Here is a view of some old sample output from this assignment: something like what this should look like when you’re done.
Help with the details:
The first task, to output plain text with distinct location names simply involves
using distinct-values() in a way that you’ve done before. (To be sure that you are getting the correct results,
if you do a count() of all the distinct location names coded as of now in the Schism
project files, the number should be 46.) We want to output a new location on each line of plain
text output, using the string-join() function. Usually when we use string-join(), we indicate a comma or some form of punctuation as our separator character for the
second argument of the function. This time, we’re using a special ASCII character
code for a line feed: . Position this in your string-join() (using quotation marks) just as you would any
other separator character. (Or you could define it as a variable in XQuery, and then when you call the variable
inside string-join(), don’t use the quotation marks.)
NOTE: In order to save the output file in eXist-db, we need to set up the string-join()
function so that we started it before the first FLWOR statement, and end it so that its arguments were in the return.
Copy this XQuery into a text file to save, as one of the files you will upload to
Courseweb for this homework assignment.
- For the second task, copy the list of names from your output text file,
and paste your list of location names into the GPS Visualizer tool (after inputting the Bing Maps Key), be sure you have selected Bing Maps as your
source (so you can use my key) and click the green “Start geocoding” button. Wait
for the results to come out in the text window: You will see lists of latitude and
longitude pairs followed by place names and other information.
-
We would like to output this information as a KML document that we can transform
for ourselves in <oXygen/>. To do that, change the output format to the right of the results box from “Google Maps” to “KML (G. Earth)”. Then, click on the link labelled in green in square brackets: [more map options].
- This brings up a page titled “Convert your GPS data for use in Google Earth.” In the “General map parameters” area, change the Output file type on the dropdown menu from the default .kmz (zipped) to the form we want: .kml (uncompressed). Notice that you can make adjustments here to your output data. The KML file you
create will be something you can import into Google Earth or Google Maps or other
mapping software if you would like to see what you have produced so far. As you learn
more about how to fine-tune your KML output and control the look of your maps in mapping
software, you may find you want to make some adjustments on this screen to your output,
or code other things yourself working in <oXygen/>. (Feel free to tinker with the
default icon and icon color, etc.)
- When you are ready, click the button in the lower right of your screen: “Create KML
file.” You will see a file is generated with a long name (beginning with today’s date)
and a .kml extension. Click to download and save the file in a folder into which you
will save other files related to this mapping assignment. (You will need to save your
XSLT in this folder, as well as one other KML file you are about to produce. It is
important that all of these files are sitting in the same directory in relation to
each other, for us to process them together.)
- You have produced one KML file, or rather you have let the GPS Visualizer do it for
you. Now, we want to create a very simple KML file of our own. Go back into eXist-db,
and write a new XQuery file (building on the previous XQuery that you wrote,
if you like). This time, we need to output a file that contains KML’s structure of
<placeMark> elements, with two elements inside: a <name> element and a <description>
element. You have learned to write XQuery to output HTML, with a root element at the
top and bottom, and curly braces { } to set off the XQuery FLWOR statements and their itemized output. This time, you
need to produce kml output, which is much simpler than HTML. Create a root element at the top of your
file: <kml>, and close it at the end of your file: </kml>. In between, set up your curly braces { }, and write a FLWOR with a return statement in this form:
<kml>
{
(: FLWOR statements :)
return
<Placemark>
<name>{<!--something here to process in XQuery!--> }</name>
<description>{<!--something here to process in XQuery!-->}
</description>
</Placemark>
}
</kml>
- Your FLWOR statement will need to output, in the kml name element, each one of the distinct-values() of the location element (using that range variable you worked with in the previous XQuery Exercise 2).
- In the description element you need to define a variable to look up the longest ancestor paragraph that holds each location (coded as <p> in the XML files of the Schism collection). Return the string() value of that paragraph so we don’t bother with outputting its internal tags: we just want its plain
text.
- When you are generating output that makes sense, do the following:
- Now, you have made not one, but two KML files. We want to take that description element you generated with the longest p output and merge it with the KML file that you generated from the GPS Visualizer.
We will do that using XSLT to write an Identity Transformation stylesheet. You will be transforming the GPS Visualizer KML into a new and slightly
modified KML, that draws from your second KML file. This will introduce you to some
coding in XSLT that you have not seen before, a kind of coding that resembles what
we do with drawing on multiple files in XQuery. In XSLT, too, you can work with multiple
files. Because this is new and unusual, we want to show you how it works, so we give
you our code with comments: You can pull it in from the DHClass-Hub, and find it in the KML folder inside Class Examples, or access it directly to download from the web here:
our KML to KML Identity Transformation XSLT file. Retrieve this file so you can open it in <oXygen/>. Here is a view of the code:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xpath-default-namespace="http://www.opengis.net/kml/2.2"
xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs
math"
xmlns="http://www.opengis.net/kml/2.2"
version="3.0">
<!--ebb: This is an identity-transformation stylesheet designed to transform KML into
KML, and to draw from a second KML file to import information.-->
<!--ebb: Note that we need to add an XPath default namespace for KML (for input)
and an xmlns for the output KML, too. All files we’re processing need to be in KML
format: note the two separate declarations of KML in the <xsl:stylesheet> element.
(The output method has to be xml because there's no recognized "kml" value for the
@method on xsl:output.)
-->
<xsl:output method="xml" indent="yes"/>
<xsl:variable name="PO" select="document('Placeography.kml')" as="document-node()"/>
<!--ebb: This variable actually invokes a second document that we're going to pull
into this transformation! We will be able to process and mingle information from more
than one file by establishing variables like this, set at the document node (or root
element) of the file. We'll invoke it below using $PO. -->
<xsl:mode on-no-match="shallow-copy"/>
<!--ebb: This code (above) matches and copies all the elements of the GPS visualizer KML file that we select as input, and essentially makes this an Identity Transformation stylesheet.-->
<xsl:template match="Placemark">
<Placemark>
<xsl:apply-templates select="description"/>
<xsl:apply-templates select="name"/>
<xsl:apply-templates select="description/following-sibling::*"/>
</Placemark>
</xsl:template>
<xsl:template match="description">
<name><xsl:value-of select="."/></name>
</xsl:template>
<!--ebb: In the initial output from GPS Visualizer, we discover that the contents
of the <description> element contains a little more detail than the original value
we input for <name>. So we write a template rule to match on <description> and output
it as <name>, and we’ll suppress the original <name> element from appearing in the
output with the next template rule. -->
<xsl:template match="name">
<description>
<xsl:value-of select="$PO//name[. = current()]/following-sibling::description"/>
</description>
</xsl:template>
<!--ebb: With this template rule we're invoking a second file, signalled by the variable
$PO which we defined at the top of our XSLT document. Note that we are establishing
a match-up here between the two files, so we need to use both dot (.)
and current()
to be very precise about our match-up: The dot (.)
refers to wherever you are at
the moment in your current path, so it’s referring in-the-moment to the most
immediate reference point: the $PO//name. We need current()
to signal where we are with the template match, the current context node of the template match in our main
import file from the GPS Visualizer.
-->
</xsl:stylesheet>
- Import your file into Google Earth or Google Maps as described in the overview above. Do you see the output? Can you click on the placeMarks
to view the paragraphs you extracted from the project files?
(Again, to import a KML file into Google Maps, follow these instructions. To view in Google Earth, simply open Google Earth and open the file within it.)
- Upload your two XQuery files, and each of the KML files you generated (three of them)
to Courseweb for this assignment. Voila!
For more on mapping with KML:
To learn more about how to work with KML and its many options for laying out and connecting
points, drawing shapes on maps, using image overlays, etc., see the Google Developers KML Tutorial. There are a few options for embedding a Google Map on a website using iframes (find the embed link on the generated map). We’ll help you with this, but first spend some time looking at map output
in various views and decide what views make the most sense for your project.