The Text:

For this assignment, work with the plain-text Project Guteberg file of George Bernard Shaw's play, Pygmalion. Download the file, open it in <oXygen/>, and remove the Project Gutenberg boilerplate at the top and bottom of the file, so that you have only the text of the play to work with.

Your Task:

Your goal is to use Find/Replace operations to prepare an XML-encoded digital edition of the play. This time, the specific markup tags you use are up to you, but we expect to see specific structural distinctions marked in your XML. These include the following:

Your goal is to use Find and Replace operations, with or without regular expression patterns, to create descriptive (rather than presentational) XML markup. We would write an XSLT transformation to convert this to an HTML digital edition for presentation. But this particular task is to make the XML that identifies, holds, and nests the structural units of the play (Acts holding scene descriptions and speeches, speeches holding info identifying speakers and stage directions etc.). You should not use manual tagging except in situations that occur so rarely that there is really no point in using an autotagging solution. (For example, you do not need to use an autotagging strategy to tag the title of the whole play or to create a root element for your XML: just do that manually.)

Consult our Guide to Autotagging with Regular Expressions and notes from class on regular expressions as you work. The TEI provides some helpful guidelines for tagging the structural units of plays, and the Digital Mitford project's Codebook lists some of the basic elements that you may wish to apply in this assignment. If you choose to follow the Mitford Codebook's model template for a TEI encoded play, don't worry about the cast list portion, since that is missing in this edition of Shaw's play, and you don't need to worry about fine-tuning the attributes and values. Just concentrate on encoding the different structural parts of the play using appropriate elements.