Corpus Analysis Assignment

Overview

This is a comparative analysis assignment, which means you will be comparing two things. For this assignment, you will be exploring how two related texts (or sets of texts) of your choice compare with one another based on their patterns of frequent word use and clusters of words. Explore this using Voyant Tools and AntConc.

You will have a choice of what texts you wish to may wish to try several different combinations, and observe what you can of their word distributions using the corpus analysis tools we have been practicing with. When you feel as if you have found some meaningful and interesting patterns that seem worth comparing between the texts, write a reflection post on one of the websites you have created for this class. (You may work with either site that you wish for this assignment.) Write up a page that presents your comparison and provides images (screen captures) and links to share the source texts you used and the data you could gather. Work with the images to illustrate your essay, in which you point out interesting patterns to compare or contrast these documents in your distant reading of them through the corpus analysis tools.

Choose your texts to compare

For this assignment, you have many options of texts you could choose. If you have an idea about a pair of texts to try comparing that is not on my suggestions list, ask me (Dr. B) about it. As long as you can save the documents as an electronic file in plain text, and cut out any unnecessary materials (like footnotes, headings, styling, etc) you can work with them using our corpus analysis tools. You may need to clean texts that you pull from internet sources to remove their headers, long sections of footnotes, anything that is not part of the main text of what you want to be analyzing.

Text clusters to try comparing

First, a small-sized comparison set:

These are a small sample size. Notice the number of words and ngram tokens created when you use the corpus tools.

Then, a larger-sized comparison set:

These are larger files and you should find a greater variety of word frequency patterns here. Choose any two or three to compare with each other. You could choose to compare two texts written in nearly the same time period, or about similar topics, or choose to contrast texts that seem completely different. It is up to you to experiment. Right-click to download the linked text files to your computer to begin working with these:

English playwrights, 1600s: Christopher Marlow and William Shakespeare
19th-century fiction:

Apply the corpus analysis tools

  1. First, prepare each of your texts for analysis, and be sure the file is saved with the extension .txt at the end so that AntConc can read the file.
  2. Copy and paste your text into Voyant tools and get a sense of the predominant words used. Look at the word cloud and the data on the most the most frequently used words, and try to look at the Voyant view of each text side by side. How do your two texts compare with each other for most frequent word use? Do you see words in common, or are they totally different? Take screen captures of the word clouds and other data from Voyant tools that you find relevant for comparison.
  3. Next, explore your texts using AntConc. .
  4. Experiment with the size of your ngram clusters: Try setting a minimum of 2 and a maximum of 4 to start, and then move on to different sizes, say minimum of 3 and a maximum of 6. If the most frequently used 2-grams are not interesting, try moving up to 3. Too large an N-gram (say of 10 words) will probably not be frequently used enough for an interesting pattern.
  5. Look at some of the most frequently used ngrams, and click on them to open the Concordance view, which shows highlighted Keywords in Context (KWIC). You can then click on the highlighted KWIC passages to view exactly where they appear in the actual text. Get a sense of what kinds of passages and sentences these phrases are part of. Take some notes on this for each of the documents you are comparing.
  6. Take screen captures as you begin to see interesting patterns so you can document them in your esssay. Hint: You can copy and the paste the AntConc program so you can open two or three copies of it at a time to view your text data from different files side by side

If you are not sure you are seeing anything worth comparing in the documents you selected, try changing it up: Change the ngram minimum and maximum value. The minimum value of two may not show the most interesting patterns, so try starting it at 3. You can always choose a different document from the collection, and continue experimenting.

Take notes, reflect, and write a post to present your comparison analysis

As you work on the the corpus analysis, take notes on things that surprise or interest you. Can you see a strong pattern that makes one writer obviously different from another? Is it a pattern you would have guessed when you started, or something surprising?

Spend some time reviewing your data, and write up a reflection post including images and screen captures from your analysis. Your post should present one pair of texts, or one trio of texts that you studied with this assignment. Present your findings: how do these texts compare and/or contrast with each other in what you could see of the distinct words and phrases they most frequently use?

Prepare your post as a webpage to present on one of the websites you developed in the previous assignment (your choice: either GitHub Pages or your Wordpress or personal PSU site). Include your screenshot images on the page.

When this portion of this assignment is complete, post links to it on Canvas at the appropriate assignment link.