Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

Transcriptional Work

Guidelines for SGML Text Mark-up at the Electronic Text Center
David Seaman, Carolyn Fay and Lisa Spiro, Electronic Text Center, University of Virginia

The Electronic Text Center collection includes a growing number of manuscripts--letters, diaries and other documents--most of which belong to the University of Virginia Library's Special Collections. These texts are processed either by Etext staff or by participants of the Rare Book School at UVA.

Our goal is to provide electronic manuscripts that are not only attractive and easy to read, but also accurate and useful to scholars, teachers and students, whose use of the documents may differ significantly. For example, American Civil War scholars may consult our collection of Civil War letters for uses of specific words or to find particular rhetorical strategies or styles; whereas high school students may read the same letters for thematic projects on religion or family in the Civil War. To allow a variety of users to view and search the texts in different ways, we process both the images and text of an electronic manuscript using the following procedures:

See also Lisa Spiro and Carolyn Fay's Procedures for Transcribing and Tagging Manuscripts


So that users can experience the full flavor of the manuscript, our electronic editions include color digital images of the manuscript pages, scanned by the UVa Library's Special Collections, usually in 24-bit color at 400 dpi. These images appear in the electronic text as in-line gifs that are linked to larger jpeg versions. When possible, images of the entire leaf, both verso and recto, are included, as well as images of the individual pages in order. In addition, we often offer a range of versions: "small," "medium" and "large" quality. The small images load most quickly, while the large images may be easier to read but take longer to load.

Example: The John and James Booker Civil War Letters


Our SGML-encoded electronic manuscripts use tags which allow readers not only to search for specific key fields (dates, names, places), but also to view different versions of the same text. Using the <orig> tag, we can record both period and modernized spelling, capitalization, and punctuation. A modern and original version can then be generated "on-the-fly" from the same SGML transcription, accomodating users who desire to read the documents exactly as written as well as users who prefer a modernized text.

Core TEI Tags for Transcription

  • Structural Elements:
    • At each division level, document type and date are recorded as attributes of the division. The head may include the title of the collection, the date, the manuscript author and recipient(s) if applicable.
      Example: Letter, 3 November 1859, Elliot Muse Healy. UVa Special Collections: MSS 10496: Papers of the Healy Family.

      <div1 type="letter" n="1859-11-03">
      <head>MSS 10496: Papers of the Healy Family
      <lb />
      Letter from Elliot Muse Healy, 3 November 1859, 4pp.
    • Many manuscripts, especially letters and diary entries, take the tags <opener> and <closer>, which group together the dateline, byline, salutation and similar phrases often appearing at the beginning or end of a letter. <opener> and <closer> may contain the following tags:

      1. <dateline>: contains a brief description of the place, date, time, etc. of the production of a letter, newspaper story, journal entry, etc.
      2. <date>: contains a date in any format. This tag must be contained within the <dateline> tag.
      3. <salute>: contains a salutation or greeting prefixed to a forward or letter, or the salutation at the closing of a letter.
      4. <signed>: contains the closing salutation, especially a signature, appended to a letter.
      Example: Bitner Collection: Letter to Henry A. Bitner from Alex Cressler, 1861, May 17. UVa Special Collections: MSS 11395.

      <name type="place">Chambersburg
      <date value="1861-05-17">May 17th, 1861
      <salute>My good old Friend:--
      Example: Liberian Letters: George Walker to Dr. James H. Minor 1858 January 27. UVa Library, call number MSS 10460 and MSS 10460-a

      <salute>Yours Truly</salute><lb />
      <signed> <name type="person">George Walker</name>
    • Original lineation and hyphenation are maintained using the <lb /> tag throughout the body of the text. For the search tool, hyphenated words are closed using the <orig> tag with the "reg" attribute.
      See below for an example.
  • Marking Data in the Text
    • <name>: Contains a proper noun or noun phrase. May take the attribute type="person," "place," "institution," "regiment," etc.

      Example: Brooks Collection: Letter to Eleanor Stuart Brooks from Andrew Brooks, 1863 March 17.

      I am sorry <name type="person">Miss Sue Harden</name> is about to<lb />
      leave the neighborhood - young people<lb />
      are sadly scarce there.
  • Recording Corrections, Regularizations, Abbreviations, Omissions, Additions and Editorial Changes
    • <abbr>: Contains an abbreviation of any sort. Takes the attribute "expan" to indicate the complete word or phrase.
      Example: "Slave Records," Gustavus Brown Alexander Papers. Letter to George H. Robinson from Charles Alexander, November 27, 1867. UVa Special Collections MSS 4800; Box 5.

      Boyd's, <name type="place">Hale
      <abbr expan="Virginia">Va.</abbr></name>
    • <add>: Contains letters, words, or phrases inserted in the text by an author, scribe, annotator or corrector. May take the attribute "place" to indicate where the additional text is written. May also take the attribute 'resp="editor"' for editorial additions. In this case, the editor must be declared in the header where the editor's name appears: <name id="editor">
      Example: 19th-Century American History, Clifton Waller Barrett Library. Contract for indenture of Susan, a girl of five years, August 19, 1865, Anonymous. UVa Special Collections MSS 6060.

      That the said<lb />
      Lieut Ab S Dial &c. &c. by virtue of the authority in him<lb />
      vested as Military <del>Commandant</del>
      <add place="supralinear">agent</add>
      aforesaid, hath put and<lb />
    • <corr>: Contains the correct form of a passage apparently erroneous in the copy text. The corr tag is a mirror of sic: the latter leaves the original text untouched, giving the correction as an attribute value; the former substitutes the correction, leaving the original reading as an attribute value. The choice between them is up to the encoder. See entry for <sic> tag below.

      Example: Liberian Letters: Henry Franklin and Milly Franklin to Dr. James H. Minor, 27 January 1858. UVa Library, call number MSS 10460 and MSS 10460-a

      respects to all my <orig reg="inquiring">Enquiring</orig> <lb />
      friends both <corr sic="while">white</corr> &
      <orig reg="Colored">Coloured</orig>
    • <del>: Contains a letter, word or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator or corrector. May take the attribute "rend" which indicates how the deletion was made in the manuscript.
      Example: A Bill for the Establishment of an University, Thomas Jefferson, 1818. UVa Special Archives Acc# 38-420.

      after which their<lb />
      meetings <del rend="overstrike">soon to be</del> stated and occasional, shall be as hereinbefore<lb />
    • <orig>: Contains the original form of a word or phrase, for which a regularized form is given in the attribute "reg=". In the following example, the <orig> tag is used to regularize end-of-line hyphenation.
      Example: The Diary of Nancy Emerson, 1860-1870. UVa Special Collections MSS 9381.

      A <orig reg="sacramental">sac<lb />
      ramental</orig> meeting was in progress
      in this place at the<lb />
      same time.

      <orig> resembles <sic> and <corr>, but rather than making a judgment about the correctness of a word or phrase, it embeds the regularized reading in the attribute, as in the following example.

      Example: John and James Booker Collection. Letter to Chloe Unity Blair from James Booker, March 16, 1864. UVa Special Collections: MSS 11237

      that two of our Co. D. had taken the Oath of
      <orig reg="Allegiance">iligeans </orig>
    • <sic>: Contains text reproduced as is although it is apparently incorrect or inaccurate.
      See entry for <corr> tag above.
      Example: 19th-Century American History, Clifton Waller Barrett Library. Half a Hundred Reasons Why the American People Should Favor Free Coinage, Anonymous, n.d. UVa Special Collections MSS 38-11, Box 1, folder "1855-1949."

      28. Because more <sic>emplayment</sic> means a greater demand for labor,<lb />
      thus by increasing the demand and lessening the supply you raise wa-<lb />
    • <unclear>: Contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source. May take the following attributes:
      • "reason" indicates why the material is difficult to transcribe
      • "resp" denotes the person responsible for the transcription
      • "cert" signifies the degree of certainty ascribed to the transcription of the text contained within the <unclear> tags.
      Example: John and James Booker Collection. Letter to Chloe Unity Blair from John Booker, February 19, 1862. UVa Special Collections: MSS 11237.

      Jimey is quite sick &<lb />
      have bin for the last week,<lb />
      I dont no whats the mater with<lb />
      him, he have weekened down
      <unclear reason="faded">as</unclear><lb />
      fast for the last week as I<lb />
      ever saw any one, he dont eat

Notes on Transcription


In order to insure that our transcriptions are as accurate as possible, the electronic text is checked several times against the best digital image we have of the manuscript. Spelling, grammar, lineation and hyphenation are recorded exactly as they appear in the manuscript. We also note the content and location of deletions and additions; one could also mark non-textual features of the manuscript including watermarks, stamps, type of paper, etc.

How to Handle Difficult Words and Passages

  • Consider the context surrounding the unknown word, and ask what word would make sense in that position.
  • Ask other people to look at the word. Often collaborators have more success than individuals.
  • Look at the original, if possible.
  • Do a little research. For instance, if you are having difficulty figuring out the name of a particular person, but know that he was a general in the Union Army during the Civil War, examine a list of Union generals.
  • Use digital imaging tools such as XV or Photoshop to enlarge the image or to separate dark portions from light portions.
  • Take some time away from the transcription. Often when you return to a difficult passage, it will be much clearer.
  • Draw the word that you see before you on your own sheet of paper.
  • Get to know the author's handwriting. If you can't decide whether a letter is a K or an R, for instance, look for examples of each and compare them to the unknown word or letter.

Notes on Annotations and Research

Annotations to the electronic manuscript are made using the <note> tag, which takes the attributes "target" and "id." References to people, places and events may be annotated, as well as any physical features of the manuscript that would not be otherwise apparent in the electronic text.

Example: Angelica Schuyler Church Papers. Letter to Angelica Schuyler Church from Alexander Hamilton, November 8, 1789. UVa Special Collections MSS 11245.

The Baron little Phillip<note target="n4">4</note>
and myself, with her consent, walked down<lb />
to the Battery, where with aching hearts and anxious eyes we<lb />
saw your vessel,

<note id="n4">[4] Philip Hamilton (1782-1801) was the eldest son of Alexander and Elizabeth Hamilton.</note>

Annotations not only situate the manuscript in context, but are also useful in clearing up transcription problems. As noted above, researching the context of an unclear passage in a manuscript can often help one determine the content of the passage.

Example: John and James Booker Collection. Letter to Chloe Unity Blair from John Booker, December 22, 1863, page 3. UVa Special Collections: MSS 11237.

John Booker letter excerpt of Memory Inman.
Upon initial reading, the above passage was difficult to transcribe. Our first attempt yielded:

I exspect thare will be a <lb />
weding near you in the christmas Memory <lb />
I <unclear>must</unclear> start home in the morning on furlow<lb />
The proximity of "christmas" and "Memory" and the lack of any punctuation between them led us to believe that the two words went together. However, it was difficult to make sense of the following sentence and what we rationalized as the verb "must" looked more like "man." By doing a little research, we were able to clear up these mysteries. First of all, John Booker's military service records did not indicate that he received furlough in December of 1863. Then, in consulting the regimental roster for the 38th Virginia Infantry, we discovered a soldier named Memory Inman had enlisted in the 38th, Company D along with John and James Booker. The passage should thus be tagged:

I <orig reg="expect">exspect</orig>
<orig reg="there">thare</orig> will be a <lb />
<orig reg="wedding">weding</orig> near you in the
<orig reg="Christmas.">christmas</orig>
<name type="person">Memory <lb />
Inman</name> starts home in the morning on
<orig reg="furlough">furlow</orig><lb />


Sample Letter: James Booker to Chloe Unity Blair, October 8, 1861

To see how one might use the core transcriptional tags to mark up a manuscript, consider the following example, taken from the Booker Collection. To show each stage of the transcription and tagging process, we provide page images of the letter, a faithful transcription of the manuscript in which line breaks are preserved, the complete tagging for the letter, and commentary on our tagging decisions.

Page Images

Page 1 Page 2


Manassas junction
Oct. 8th 1861

Dear Cousin

I write afew lines this
morning to inform you that I am well
at this time and hopeing that it
may find you all injoying the same
blesing, the health of our company
is better at this time than it has
bin for some time,

I have no news of intrust to write
to you, it is thought that we
will have a battle in a few days, its
reported that thay was fighting
yesterday at fawls Church I dont [ know] weth
er it was so or not, one of the Dan
ville Grays was upto see us last night
he said the yankees was in four
miles of them thay are stationed at
Farfax Court House six miles a head of
us, it is thought that we will
have a verry hard battle when it
does come off, I received a letter from
Addie [add note 1] last eavning it [ [unclear: ] ] afforded me
great pleasure to hear that he was
improveing so fast,

I will ad no more at [unclear: present] so good bye

[ Page 2]

write soon to your affectionate Cousin

James Booker

To Miss C. U. Blair


[1] "Addie" probably refers to Drury Addison Blair (1839-1864), the
Bookers' cousin. Blair joined Company D when it was formed in May of
1861, but was discharged due to chronic bronchitis in August of 1861
(Gregory 81). See James Booker's letter of July 14, 1861, in which "A.
Blair" includes a postscript to Chloe Unity Blair.

Tagged Version of the Letter

[TEI Header goes here]

<text id="Boo1j08">
<div1 type="letter" n="1861-10-08">
<pb n="1" />

<name type="place">Manassas
<orig reg="Junction">junction</orig>
<date n="1861-10-08">
<abbr expan="October">Oct.</abbr>
8<hi rend="superscript">th</hi> 1861
<salute>Dear Cousin</salute>

<p>I write
<orig reg="a few">afew</orig>
lines this <lb />
morning to inform you that I am well <lb />
at this time and <orig reg="hoping">hopeing</orig> that it <lb />
may find you all <orig reg="enjoying">injoying</orig> the same <lb />
<orig reg="blessing. The">blesing, the</orig> health of our company<lb />
is better at this time than it has <lb />
<orig reg="been">bin</orig> for some <orig reg="time.">time,</orig> </p>

<p>I have no news of <orig reg="interest">intrust</orig> to write <lb />
to <orig reg ="you. It">you, it</orig> is thought that we <lb />
will have a battle in a few <orig reg="days. It's">days, its</orig><lb />
reported that <orig reg="there">thay</orig> was fighting <lb />
yesterday at <name type="place"><orig reg="Falls Church.">fawls
Church</orig></name> I <orig reg="don't">dont</orig>
<add n="editor">know</add>
<orig reg="whether">weth <lb />
er</orig> it was so or <orig reg="not. One">not, one</orig>
of the <orig reg="Danville">Dan <lb />
ville</orig> Grays was <orig reg="up to">upto</orig> see us last
<orig reg="night.">night</orig> <lb />
<orig reg="He">he</orig> said the yankees was in four <lb />
miles of <orig reg="them.">them</orig>
<orig reg="They">thay</orig> are stationed at <lb />
<name type="place"><orig reg="Fairfax">Farfax</orig> Court House</name>
six miles <orig reg="ahead">a head</orig> of <lb />
<orig reg="us. It">us, it</orig> is thought that we will <lb />
have a <orig reg="very">verry</orig> hard battle when it <lb />
does come <orig reg="off.">off,</orig> I received a letter from <lb />
<name type="person">Addie</name><note target="n1">[1]</note>
last <orig reg="evening. It">eavning it</orig>
afforded me <lb />
great pleasure to hear that he was <lb />
<orig reg="improving">improveing</orig> so <orig
<salute>I will <orig reg="add">ad</orig> no more at
<unclear reason="under folded page edge">present</unclear>
so <orig reg="goodbye.">good bye</orig>

<pb n="2" />
<orig reg="Write">write</orig> soon to your affectionate
<signed><name type="person">James Booker</name></signed>
<seg type="recipient">To Miss C. U. Blair</seg>

<div1 type="notes">
<note id="n1">[1] "Addie" probably refers to Drury Addison Blair (1839-1864),
the Bookers' cousin. Blair joined Company D when it was formed in May of
1861, but was discharged due to chronic bronchitis in August of 1861 (Gregory 81).
See James Booker's letter of July 14, 1861, in which "A. Blair" includes a postscript to Chloe Unity Blair. </note>


In any tagging project, one makes particular choices based on the the goals of the project, the problems posed by the documents to be tagged, in-house procedures, and the standards of TEI Lite. With the Booker Collection, we were presented with letters written by men who had rather rudimentary writing skills; for instance, many words are misspelled, and the Bookers use commas rather than periods to separate sentences. For some users, such errors might impede comprehension, so we wanted to present a more readable version while also preserving the original features of the document. By using the <orig> tag and designing a special SGML to HTML filter, we were able to make two versions of the letter accessible: the original version, a transcription that retains all of the period spelling, capitalization, and punctuation, and the modernized version, a transcription with modernized spelling, capitalization, and punctuation. In adding <orig> tags, we were fairly light handed; we included the standard spelling of words, and standardized punctuation by replacing the sentence-ending commas with periods. In addition to regularizing spelling and punctuation, we tagged places and names with the <name> tag, and we added an informational note about a soldier mentioned in the letter. Although our tagging was quite comprehensive, we could have tagged even more information if we had decided it was important to our project; for instance, we might have marked the "Danville Grays," a company in the Confederate Army, with a tag such as <name type="company">, and we might have standardized capitalization by tagging "yankee" with <orig reg="Yankee">.

| Back | Next |