Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

Procedures for Transcribing and Tagging Manuscripts

Lisa Spiro and Carolyn Fay, Electronic Text Center, University of Virginia


[Note: This document is intended as a supplement to the Electronic Text Center's helpsheet on Transcriptional Work.]

Not only does the Electronic Text Center offer online access to thousands of books, poems, and short prose works, but also to hundreds of rare manuscripts. By placing these documents online, the Electronic Text Center makes rich historical and literary documents readily available to a range of users.

Preparing manuscripts for the web requires special care, since they must be transcribed and marked up using special TEI tags for primary sources. For most manuscripts, we also include full-color digital images so that users may get a sense of the document as a physical object.

For examples of how the Electronic Text Center creates and presents manuscripts, see the following collections, several of which were created by participants in Rare Book School:

  • The Booker Collection: Letters written by James and John Booker to their cousin Chloe Unity Blair. The Booker brothers, who were from Pittsylvania County, Virginia, served in the 38th Virginia Infantry and participated in many battles, including Malvern Hill, Gettysburg, and Drewry's Bluff.
  • The Brooks Collection: Letters written by and about Andrew, William, Charles, and Moffett Brooks, all members of the Liberty Hall Volunteers, a company formed at Washington College (now Washington and Lee). In reading these letters, one develops a sense of the family connections among the correspondents, as they exchange news and gossip about home and about relatives who are serving in the Confederate Army.

    See also Gallery of the Liberty Hall Volunteers.

  • The Bitner Collection: This collection features letters written to Henry Bitner of Shippensburg, Pennsylvania by six correspondents, five of whom were serving in the Union Army. Unlike the Booker and Brooks letters, which focus on family, these letters reveal the connections among a group of young men. In part, the letters focus on political events and on military life (rather than serving in the military, Bitner himself remained in Pennsylvania and worked as a teacher), but they also dwell on the customs of youth-- attending "singings" and spelling bees, flirting with young women, picking apples, and so forth.

    See also Introduction to the Bitner Letters

  • Liberia Letters: Beginning in the 1830s, former Virginia slaves settled in Liberia with the assistance of the American Colonization Society. In their letters, correspondents discuss matters such as farming, disease, politics, and religion, and they request that essential supplies be sent to them.

For both the Brooks and the Bitner projects, we have collaborated with the Valley of the Shadow project.


In preparing letters, diaries, and other manuscripts for the Electronic Text Center's collections, we aim to meet two related goals:

  • Accuracy: A primary goal of documentary editing is to preserve as many features of the original document as possible. To this end, we carefully transcribe each page, noting and preserving such features as line breaks, underlining, post-scripts scrawled in margins, changes in hand, and so forth. TEI includes a number of tags that enable an editor to describe these textual and non-textual features. For instance, we record information about the content and location of additions and deletions with the <add> and <del> tags, and we mark errors in the text and editorial emendations with <sic>, <corr>, and <orig reg>.

    To give users a rich visual sense of the original document, we include high quality digital images that reveal such details as handwriting, the color of the paper, ink spots, smudges, and so forth. We can make these images available in a variety of sizes, so that users can determine how much information they want and how quickly they want the image to load.

  • Accessibility: Even as we strive to replicate the original document as accurately as we can, we also want the text to be accessible to as many users as possible, for as many uses as imaginable. Of course, simply putting the text and its accompanying images up on the web makes a rare, unique document available to millions of users. These texts are fully searchable, so that scholars can discover connections among documents that were previously unknown.

    However, some users might be befuddled by the idiosyncratic spelling of, say, Civil War soldiers. Therefore, when we encounter a spelling error, we surround it with the <orig reg> tag:

    <orig reg="computer">kompewtrr</orig>

    By using this tag, we allow a user to search for the word as it is conventionally spelled. Moreover, we can employ this tag to create two versions of the document--the original transcription (which scholars might make use of), and the modernized version (which young students might consult).

    For some of our manuscript projects, we try to include background information so that users can begin to make sense of seemingly obscure references. We are fairly light-handed about annotating texts, since we want users to reach their own understanding. For an example of a site that makes available Civil War manuscripts along with contextual information, see The John and James Booker Civil War Letters

Work Flow

In transcribing and tagging a document, we go through a series of steps:

  1. Checking Transcriptions: When we receive a new manuscript project, typically the first step has already been completed--someone has already done the preliminary transcription. But often the transcriber has overlooked or mistranscribed some crucial words. To correct these errors, and to get to know the texts, we begin our project by comparing the transcription to the digital images of the original document--and to the document itself, if possible.

  2. Researching Confusing Passages: If we have questions about unclear words or phrases, we will do the research necessary to answer them, turning to resources such as the Oxford English Dictionary and Encyclopaedia Britannica.

  3. Tagging With the transcription completed, we are ready to do the basic tagging. First we mark up the overall structure of the document with divisional tags (e.g. <div1>); if we are tagging a letter, we also mark up the <opener> and <closer>. Next we tag paragraphs (<p>) and line breaks (<lb>), abbreviations (<abbr>), deletions (<del>), additions (<add>), and regularized spellings (<orig reg=>).

  4. Adding Informational Notes When a document is full of obscure references to people, places, and events, we often add short informational notes to aid in the reader's comprehension.

  5. Processing Images Typically Special Collections has created a digital image of each manuscript page. To process the images, we use editing tools such as ImageMagick and XV. We insert the images into the file using the <figure> tag, and we include descriptions of each image under the <figDesc> tag.

  6. Creating the Header We prepare the TEI header using the Electronic Text Center's web based form, recording such information as the names of the author and the recipient, the date, and so forth.

  7. Parsing the File After the header has been joined to the body of the text (through the "cat" command), we check over the file. If the text looks like it is in good shape, we check whether the tagging is correct by parsing. First we make sure that each tag opens and closes by running multidocs; then we check to see if the tags meet the guidelines of TEI Lite by using our "parse" program. If the file passes both of these "tests," it's ready to go online.

  8. Proofreading the File Once the text is put online, we proofread it carefully to make sure that the images load, that line breaks appear, and that the transcription is complete.

Some examples of tagging and transcription at work

In working on manuscript projects, we have come across several difficult cases that present both tagging and transcriptional challenges. Three of these cases follow.

The Case of the Mysterious Place Name

When we first confronted this scrawled place name (which is taken from John Booker's letter of December 22, 1863), we were utterly lost:

Image of Booker letter in which
the place name is difficult to discern

Since it was important to establish where Booker was writing from, we tried a variety of techniques to figure out what these words said. We traced them out on our own paper; we compared these characters to other characters in the letter; we called in others to consult with us; we looked at maps of North Carolina. Ultimately, two rather obvious clues enabled us to figure out the solution: first, G. Howard Gregory's 38th Virginia Infantry told us that Booker's company was encamped at Kinston, North Carolina during the winter of 1863-1864; second, James Booker's letter of January 1, 1864 was written from Kinston.

Once we determined what the correct spelling of the place name was, we were able to tag the dateline as follows:

<name type="place"> Camp Near <orig reg="Kinston"> Kiston</orig>,
<abbr expan="North Carolina">N. C. </abbr></name> <lb>
<date n="1863-12-22"> <abbr expan="December">Dec. </abbr>
the 22<hi rend="superscript"> <orig reg="nd">th </orig> </hi> 1863</date>
<salute>Dear Cousin Unity</salute>

We could have used the <sic> or the <corr> tags to mark or correct the misspelling of Kinston, but opted instead to use the <orig reg> tag and to make a note offering additional information about Kinston.

The modernized version of the dateline would appear as follows:

Camp Near Kinston North Carolina
December the 22nd 1863

Dear Cousin Unity

The Case of Remembering Memory

[taken from the helpsheet on "Transcriptional Work"]

As noted above, researching the context of an unclear passage in a manuscript can often help one determine the content of the passage.

Example: John and James Booker Collection. Letter to Chloe Unity Blair from John Booker, December 22, 1863, page 3. UVa Special Collections: MSS 11237.

John Booker letter excerpt of Memory Inman.

Upon initial reading, the above passage was difficult to transcribe. Our first attempt yielded:

I exspect thare will be a <lb>
weding near you in the christmas Memory <lb>
I <unclear>must</unclear> start home in the morning on furlow<lb>
The proximity of "christmas" and "Memory" and the lack of any punctuation between them led us to believe that the two words went together. However, it was difficult to make sense of the following sentence and what we rationalized as the verb "must" looked more like "man." A little research cleared the confusing words right up. First of all, John Booker's military service records did not indicate that he received furlough in December of 1863. Then, in consulting the regimental roster for the 38th Virginia Infantry, we discovered a soldier named Memory Inman had enlisted in the 38th, Company D along with John and James Booker. The passage should thus be tagged:

I <orig reg="expect">exspect</orig>
<orig reg="there">thare</orig> will be a <lb>
<orig reg="wedding">weding</orig> near you in the
<orig reg="Christmas.">christmas</orig>
<name type="person">Memory <lb>
Inman</name> starts home in the morning on
<orig reg="furlough">furlow</orig><lb>

When put through the TEI filter, the passage will appear as follows:

Original version
I exspect there will be a
weding near you in the Christmas Memory
Inman starts home in the morning on furlou

Modernized version
I expect there will be a
wedding near you in the Christmas. Memory
Inman starts home in the morning on furlough

The Case of the Multiple Correspondents

With several of the letters that we've edited, more than one correspondent has written the text. See, for instance, James and John Booker's letter of August 3, 1862. Initially, we were not sure how to handle this phenomenon-- should we treat John Booker's additions as a postscript? as a separate textual division?

In editing this letter, we decided to make use of two numbered divisions and to include a note about the long post-script. The tagging is as follows:

<signed>James Booker</signed><lb>
<seg type="recepient">to <name>Miss C. U. Blair</name></seg>
<div1 type="letter">
<pb n="3">
<figure entity="F62AU3P3">
<figDesc>Third page of manuscript Civil War letter from James and John Booker to their cousin Chloe Unity Blair, dated August 3, 1862.</figDesc> </figure>
<date n="1862-08-03">
Sunday <orig reg="evening">eavning</orig>
August the 3 <hi rend="supralinear">1862</hi></date>
<salute>Dear Cousin</salute>
I write you a few lines<lb>

Tagging Envelopes

Often letters will be accompanied by envelopes. Although no TEI standards for tagging envelopes exist, we have decided to include them in the front matter, on the assumption that envelopes are not part of the letter proper but are what a reader probably first experienced. We mark relevant information such as <name> and <date>, and we include digital images of the envelope so that users can see such features as postmarks, sealing wax, and so forth. Consider the following example, taken from the Liberia Letters: William Douglass to Dr. James H. Minor, 1857 February 5.

<div2 type="envelope">
<figure entity="page image name goes here"></figure>
29 24<lb>
<name type="person">Dr. James Minor</name><lb>
<name type="place">Cobham Depot Albemarle <lb>
<abbr expan="County">Co.</abbr><lb>
Virginia <abbr expan="United States">States</abbr></name><lb>
Via <name type="place">England</name>

This tagging would produce the following text in the modernized version:

29 24
Dr. James Minor
Cobham Depot Albemarle
Virginia United States
Via England