Our task

The Graveyard project currently archives and shares personography data (or data about people) from the Brush Creek Cemetery’s records, specifically for a group of 142 burials in Section I of the graveyard. These burials are organized into family plots, and the records frequently (but not always) indicate a location of death. Thanks to the data curation of the Graveyard team, we are able to look up and plot information about each family represented in Section I and get a sense of the family’s geographic distribution from the locations of death associated with each last name (or surname). XPath of the Graveyard team’s data file tells us there are 56 distinct surnames for the 142 persons buried here. For this assignment, we will concentrate only on the larger families, those where the same surname is associated with three or more people. Of these we will graph a total count of deaths per surname, and then superimpose stacked bars representing each regional death location and the count of deaths for that region.

Here is our sample output for the assignment. Yours may be styled differently, and you do not need to output the diagnostic information we did at the top of the plot (unless you want to do something similar). You should title and label your graph clearly and provide an explanation of colors (or textures) that you use to distinguish locations.

Access the Graveyard TEI personography file from our eXist database in this location:

doc("/db/graveyard/graveyardInfo-TEI.xml")

The personography file was prepared using TEI code. To read from the TEI and to output in the SVG namespace, you will need to declare your namespaces and work with the tei: prefix for all TEI elements.

xquery version "3.0";
declare default element namespace "http://www.w3.org/2000/svg"; 
declare namespace tei="http://www.tei-c.org/ns/1.0"

We should begin by surveying the personography file. Open it in eXide (with file-->open, and browse your way to it). Notice how the personography entries are organized, and see how the <surname> elements are positioned. And notice how the <region> elements are nested inside the <death> element. Since the regions (US states or Canadian provinces) are more frequently shared and are easiest to understand, we will plot our stacked chart based on these elements (and bypass the cities encoded in <settlement>settlement). Here is a sample entry, highlighting the elements we seek:

           <person xml:id="L12P1" role="occupant" sex="m">
            <persName><surname>Henderson</surname><forename type="first">James</forename></persName>
            <age>49</age>
            <death when="1931-05-21"><placeName><settlement type="city">Mann
                  County</settlement><region type="state">New York</region></placeName><note
                type="cause">unknown</note></death>
            <event type="interred" when="1931-05-24">
              <desc/>
            </event>
            <trait type="racial">
              <label>white</label>
            </trait>
            <geo><!--whitespace-separated geocoordinates look up how to do this in the TEI--></geo>
          </person>

To plot your graph in SVG from XQuery, apply what you have been learning about SVG in the previous assignments. For example, when you plotted the timeline, you learned how to code a viewport in the SVG root element, and you learned how to plot from x=0 and y=0 so that your plot is visible in the SVG coordinate space, using transform="translate(x, y)". You also learned how to plot and space hashmarks at regular intervals along a line. At the very least you want to space bars on a bar graph at regular intervals, and draw X and Y axes based on maximum values multiplied by spacer variables that you set. Keep in mind that when you output multiple SVG elements in a return, you will want to bundle them together in a single group, or <g>. And don’t forget to use the the tei: prefix when reaching into the TEI elements!

How to make a stacked bar graph with XQuery

Draw your X and Y axes, and set up a Viewport

Work out your maximum values for X and Y and set a view port with a width and a height, and then a viewBox attribute to scale your output if you wish.

Look at examples of how we prepared SVG Viewports in class, and check out Sara Soueidan’s excellent detailed explanation. Here is a brief summary overview of how to set the Viewport attributes on the SVG root element:

Now, if I want to define how the image behaves on a screen, I define the viewBox attribute. viewBox takes four values: viewBox="(x1,y1,x2,y2)" which define a new coordinate system to use in rendering our output image.

Plotting surnames and total bars

We recommend beginning by plotting the each surname in a text element running beneath your X axis, and in the same X locations, plotting the total count of deaths per surname. Then we will go on to superimpose the stacked bars overtop of that total bar. Not every death was recorded with a location, so our stacked bars should stack from the bottom up, and in many cases leave some room at the top for those whose locations at death were not marked.

Plotting the stacks: a special cumulative for loop, or array

After you have output your surnames with their total counts, you will need to make an inner FLWOR, within the return statement of the “surname” FLWOR.

We found it helpful to store some arrays (or lists of values) in global variables, and we looped through the arrays in FLWOR statements used to output the colors in our legend, as well for the regional bars. Later on we found it absolutely essential to make a special kind of array to properly calculate the Y position of each stacked bar associated with a surname.

In order to stack bars you need to start each new bar where the previous bar completed. That means, if there are four bars to plot (bar 1, bar 2, bar 3, and bar 4), we have to plot like this:

We need a way to keep a running total of the heights, so that as we loop through each region associated with a surname, we output the cumulative sum() of an array storing the values from the previous loops. Here is some code to show how we prepared an accumulator array:

         let $matchesRegionList := 
          for $d in $distDeathsSurRegion
          where substring-before($d, '_') = $i
          return $d
     for $m at $posM in $matchesRegionList
     let $reg := tokenize(substring-after($m, '_'), '-')[1]
     let $count := tokenize(substring-after($m, '_'), '-')[last()]
     let $intCount := xs:integer($count)
     let $regYVal := $intCount * $Y_StretchFactor
     let $accumYVal := 
          for $a in (0 to $posM - 1)
          (:ebb: This very useful loop lets us look up the counts at each of the *previous* $posM steps! :)
          let $accum := $matchesRegionList[$posM - $a]
          let $countAccum := (tokenize(substring-after($accum, '_'), '-')[last()], '0')[1]
          let $intCountAccum := xs:integer($countAccum)
          let $accumY := $intCountAccum * $Y_StretchFactor
          return $accumY
      let $accumPos := sum($accumYVal)
     let $cVal :=
         for $v in $colorStates
         where $reg = substring-before($v, '_')
         return substring-after ($v, '_')
         (:ebb: Here we're looping over a global variable called $colorStates, 
         and wherever its region substring matches our current region, we output its color value substring for use in coloring our region stacks. :)
     return
      <rect class="{$reg}_{$count}" x="{$pos * $X_Spacer}" y="-{$accumPos}" width="20" height="{$regYVal}" style="stroke: black; stroke-width:1; fill: {$cVal}"/>

Note that $i refers to the surname value from our outermost for loop, and $distDeathsSurRegion is a reference to another array we stored up in a global variable, in which we stored, for each distinct region, a concatenated string (using the concat() function) piecing together a surname, followed by an underscore ("_"), followed by the region, followed by a hyphen ("-"), and the surname deathcount at that region. We are reaching up into that global variable, finding the substring that matches our current $i surname, and outputting it as a smaller array, stored in $matchesRegionList .

Next we loop through the $matchesRegionList array and extract the substrings with the information we want for region ($reg) and deathcount ($count). And we convert the count to an integer, and multiply it to the proportion we set for plotting on our graph. And this is where we need to plot how much we have to adjust the y position of each bar, by looking up the values at each preceding position in our $matchesRegionList array. This line of code is vital:

for $a in (0 to $posM - 1)

Here we define a for to range over integers from 0 to the last previous position of the $posM variable (or $posM - 1). For each position, whether it is 0, 1, 2, 3, etc, we subtract it from the current $posM, and we set that value as the position to retrieve on the $matchesRegionlist. The loop runs to its maximum, and outputs a series of numbers. In our next variable, $accumPos we add up the values of that array using the sum() function, which is designed add up a set of values. We use that $accumPos to calculate the position of the current bar for a region in our graph.

Colors: Storing and matching them with regions

The code block above shows how we associated colors with each region (where we work with $cVal). Notice that involves a similar strategy to what we used above, with opening an inner for loop and finding where something in it matches something at the current position in the outer for loop. Here is how we recommend working with colors:

  1. Set up a global variable to store a series of color values for all of the regions you will need (all of the regions represented by the families with more than 3 deaths).
  2. Make another global variable that associates each value with a region, using a set format. We recommend not doing this by hand, but by setting up a pair of for loops to run together: Set this up by walking through your array of the distinct values of regions for all families with more than 3 deaths, and set a position variable, thus: for $i at $pos in $distinctUsualRegions.
  3. Inside, set another for loop to walk through your array of colors, and set a similar position. (Notice that the number of values in each array will need to match exactly.) Where the position in the region loop matches the position in the color loop, return something that splices the two together, using a character of your choice to join them (say, a hyphen or an underscore). We use concat()for this. This makes an array of values that hold region and color information together, that you can access later as you plot your graph. And you can use it to plot a legend, too!

Your output

The dimensions and style of your plot are up to you, though we expect your output to be clearly labelled, so visitors to the Graveyard project will understand what they are seeing. Save your SVG output in your folder in eXist, but paste a copy of your XQuery script in a text file, save it according to our usual homework file naming conventions, and upload it to Coursweb.