Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

The TEI header

Guidelines for SGML Text Mark-up at the Electronic Text Center
David Seaman, Electronic Text Center, University of Virginia
[ornament]

The TEI header is a vital part of any text we prepare. It is a record of the print source for the electronic text, of the work we have done on the electronic text, of the creation of the electronic text, and it provides various date and keyword fields for our search tools. It is also the source of the USMARC record that goes into our online library catalog.

The Web Forms Header Template

UVa text processors use a "fill-in-the-blanks" web form to create TEI headers. This form reads in an SGML template and configures itself to it, saving out valid TEI and an automatically-generated MARC record.

Below are some examples of the principal different types of printed and manuscript materials for which we create headers:


The version of the TEI header that we use is comprised of four major sections:

<teiHeader>
  • <fileDesc>...</fileDesc>

  • <encodingDesc>...</encodingDesc>

  • <profileDesc>...</profileDesc>

  • <revisionDesc>...</revisionDesc>

</teiHeader>

  1. The File Description -- <fileDesc> -- contains a full bibliographical description of the computer file -- title, author, creator of electronic version, publisher of electronic version, the size of completed file, in KB -- along with information about the printed source from which the electronic text was derived (contained within the <sourceDesc>).

    Notes: annotations about the electronic text go in the first <notesStmt> field; notes about the physical object -- the book in hand -- go in the <notesStmt> field in the <sourceDesc> field. It can be difficult sometimes to determine which is which -- ask for help in this case. In disputed cases, default to the <notesStmt> field in the <sourceDesc>.

    Editions, impressions, reprints: if in doubt about what constitutes an edition and an impression, see David or Catherine. As a rule of thumb, identical pagination and lineation between two versions of a text means that they are different impressions of the same edition -- they have been printed from the same physical printing plates. Covers, illustrations, titlepage dates may of course be quite different between two such impressions.

  2. The Encoding Description -- <encodingDesc> -- allows for detailed description of whether (or how) the text was normalized during transcription, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied, and so on.

  3. The Text Profile Description -- <profileDesc> -- provides a detailed description of non-bibliographic aspects of the text, specifically the languages used, the situation in which it was produced, the participants, and their setting.

    Note that the <date> field in the <creation> section is vital; OpenText reads this when it constructs its "Centuries" document structures. A missing or incorrect <date> here will result in the work being left out or misplaced in the "Centuries" group.

    The <keywords> fields should always include the following:


    • either "fiction" or "non-fiction"

    • always use at least one of the following: "drama" ; "prose" ; "poetry". For drama, if in verse, add "verse".

    • always "masculine" or "feminine"; if joint authorship, use both.

    • when appropriate, use any of the following: African American/Native American/American Civil War/Thomas Jefferson/Women Writers/Young Readers/Literature in Translation/ Special Collections.
      To get a feel for how we use these, see the sebsets online under Modern English.


  4. The Revision History Description -- <revisionDesc>: allows present and future encoders to provide a history of changes made during the development of the electronic text.

The University of Virginia Etext Center Header: TEMPLATE

<teiHeader type="aacr2">
<fileDesc>

<titleStmt>
<title> The work's title [a machine-readable transcription]</title>
<author>The work's author, last name first</author>
<respStmt>
<resp>Creation of machine-readable version: </resp>
<name>creator of electronic version</name>
<resp>Creation of digital images: </resp>
<name>creator of image(s)</name>
<resp>Conversion to TEI.2-conformant markup: </resp>
<name>University of Virginia Library Electronic Text Center.</name>
</respStmt>
</titleStmt>
<extent>ca. XXX kilobytes </extent>
<publicationStmt>
<publisher>University of Virginia Library.</publisher>
<pubPlace>Charlottesville, Va.</pubPlace>
<idno type="ETC">collection and ID, e.g. Modern English, AusEmma</idno>
<availability>
<p>Place where text can be found, e.g. Available from: Oxford Text Archive</p>
<p>URL: http://etext.lib.virginia.edu/modeng.browse.html</p>
<p>Available commercially from:</p>
</availability>
<date>Current year</date>
</publicationStmt>
<seriesStmt>
<p>Name of electronic series, if any</p>
</seriesStmt>
<notesStmt>
<note>Illustrations have been included from the print version. Note about image, if needed; note, for instance, if source differs from print source.</note>
<note>any other notes</note>
</notesStmt>
<sourceDesc>

<biblFull>
<titleStmt>
<title>The work's title</title>
<title level="a|m|j|s|u">The title of the physical volume, if different</title>
<author>The author's name, first name first</author>
<respStmt>
<resp>e.g. Editor / Translator / Annotator</resp>
<name></name>
</respStmt>
</titleStmt>
<editionStmt>
<p>Edition information, e.g. 1st Edition.</p>
</editionStmt>
<extent></extent>
<publicationStmt>
<publisher></publisher>
<pubPlace>place of publication</pubPlace>
<date>date of publication</date>
</publicationStmt>
<seriesStmt>
<p>Name of print series.</p>
</seriesStmt>
<notesStmt>
<note></note>
</notesStmt>
</biblFull>

</sourceDesc>

</fileDesc>
<encodingDesc>

<projectDesc>
<p>Prepared for the University of Virginia Library Electronic Text Center.</p>
</projectDesc>
<editorialDecl>
<p>All quotation marks retained as data.</p>
<p>Spell-check and verification made against printed text using WordPerfect spell checker.</p>
<p>All unambiguous end-of-line hyphens have been removed, and the trailing part of a word has been joined to the preceding line.</p>
<p>The images exist as archived TIFF images, one or more JPEG versions for general use, and thumbnail GIFs.</p>
<p id="ETC">Keywords in the header are a local Electronic Text Center scheme to aid in establishing analytical groupings.</p>
</editorialDecl>
<refsDecl>
<p>ID elements are given for each page element and are composed of the text's unique cryptogram and the given page number, as in AusEmma1 for page one of Jane Austen's Emma.</p>
</refsDecl>
<classDecl>
<taxonomy id="LCSH">
<bibl>
<title>Library of Congress Subject Headings</title>
</bibl>
</taxonomy>
</classDecl>

</encodingDesc>
<profileDesc>

<creation>
<date>First published date</date>
</creation>
<langUsage>
<language id="">languages used in the text; use one "language pair of tags for each language, and for the id= value, use an ISO639 code</language>
</langUsage>
<textClass>
<keywords>
<term>fiction or non-fiction; poetry, prose, or drama</term>
</keywords>
<keywords scheme="LCSH">
<term>LCSH</term>
</keywords>
</textClass>
<textClass>
<keywords>
<term type="artist">name of illustrator, painter, etc. </term>
<term type="visual work">engraving/painting/illustration, </term>
</keywords>
<keywords>
<term>24-bit color; 600 dpi [or variant]</term>
</keywords>
</textClass>

</profileDesc>
<revisionDesc>

<change>
<date>date of changes</date>
<respStmt>
<resp>corrector</resp>
<name>who made the changes</name>
</respStmt>
<item>what was done</item>
</change>

</revisionDesc>
</teiHeader>



The Tags Exclusive to the Header

The global attributes are as follows:

  • n=
  • id=
  • rend=
  • lang=


<teiHeader>
supplies the descriptive and declarative information making up an "electronic title page" prefixed to every TEI-conformant text.

May contain: encodingDesc fileDesc profileDesc revisionDesc

Attributes: global plus the following:

type : specifies the kind of document to which the header is attached.

creator : identifies the creator of the teiHeader, using the name or initials of the person or institution responsible.

status : indicates whether the header is new or has been substantially revised.

Legal values are: "new" or "update".

date.created : indicates when the first version of the header was created.

date.updated : indicates when the current version of the header was created.
<fileDesc>
contains a full bibliographic description of an electronic file including statements of responsibility and a full bibliographic description for the source or sources from which the electronic text was derived.

May contain: editionStmt extent notesStmt publicationStmt seriesStmt sourceDesc
titleStmt

Attributes: global
<titleStmt>

May contain: title author editor sponsor funder principal respStmt

Attributes: global
<sponsor>

May occur within: titleStmt

May contain: #PCDATA ident code kw abbr address date name num rs time add corr del gap orig reg sic unclear emph foreign gloss hi mentioned soCalled term title ptr ref xptr xref anchor s seg gi formula

Attributes: global
<funder>
specifies the name of an individual, institution, or organization responsible for the funding of a project or text. Funders provide financial support for a project; they are distinct from sponsors, who provide intellectual support and authority.

May occur within: titleStmt

May contain: #PCDATA abbr add address code corr date del emph foreign formula gap gi gloss ident hi kw lang mentioned name num orig ref reg rs s seg sic soCalled term time title xptr xref

Attributes: global
<principal>

May include: PCDATA ident code kw abbr address date name num rs time add corr del gap orig reg sic unclear emph foreign gloss hi mentioned soCalled term title ptr ref xptr xref anchor s seg gi formula

Attributes: global
<editionStmt>
groups information relating to one edition of a text.

May contain: edition respStmt p

Attributes: global

Example:

<editionStmt>
<edition n=S2>Students' edition</edition>
<respStmt> <resp>Adapted by </resp><name>Elizabeth Kirk</name>
</respStmt>
</editionStmt>
<edition>
describes the particularities of one edition of a text.

May occur within: bibl editionStmt

May contain: #PCDATA abbr add address anchor code corr date del emph foreign formula gap gi gloss hi ident kw mentioned name num orig ptr ref reg rs s seg sic soCalled term time title xptr xref

Attributes: global

Example:

<edition>First edition <date>Oct 1990</date> </edition>
<edition n=S2>Students' edition </edition>
<extent>
describes the approximate size of the electronic text as stored on some carrier medium, specified in any convenient units.

May occur within: bibl biblFull fileDesc

May contain: #PCDATA abbr add address anchor code corr date del emph foreign formula gap gi gloss hi ident kw mentioned name num orig ptr ref reg rs s seg sic soCalled term time title xptr xref

Attributes: global

Example:

<extent>3200 sentences </extent>
<extent>ten 3.5 inch high density diskettes </extent>
<publicationStmt>
groups information concerning the publication or distribution of an electronic or other text.

May occur within: biblFull fileDesc

May contain: address authority availability date distributor idno p publisher pubPlace

Attributes: global

Example:

<publicationStmt>
<publisher>Chadwyck Healey </publisher>
<pubPlace>Cambridge </pubPlace>
<availability>Available under licence only </availability>
<date>1992 </date>
</publicationStmt>
<distributor>
supplies the name of a person or other agency responsible for the distribution of a text.

May occur within: publicationStmt

May contain: #PCDATA abbr add address anchor code corr date del emph foreign formula gap gi gloss hi ident kw mentioned name num orig ptr ref reg rs s seg sic soCalled term time title xptr xref

Attributes: global.
<authority>

May include #PCDATA ident code kw abbr address date name num rs time add corr del gap orig reg sic unclear emph foreign gloss hi mentioned soCalled term title ptr ref xptr xref anchor s seg gi formula

Attributes: global
<idno>
supplies any standard or non-standard number used to identify a bibliographic item.

May occur within: bibl publicationStmt seriesStmt

May contain: #PCDATA

Attributes: global plus the following:

type : categorizes the number, for example as an ISBN or other standard series.
Value: A name or abbreviation indicating what type of identifying number is given (e.g. ISBN, LCCN).
<availability>
supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, etc.

May occur within: publicationStmt

May contain: p

Attributes: global plus the following:

status : supplies a code (free, unknown, or restricted) identifying the current availability of the text:

free : the text is freely available.
unknown : the status of the text is unknown.
restricted : the text is not freely available.

Example:
<availability status=restricted>
<p>Available for academic research purposes only.

<availability status=free>

<availability status=restricted>
<p>Available under licence from the publishers.
<seriesStmt>
groups information about the series, if any, to which a publication belongs.

May occur within: biblFull fileDesc

May contain: idno p respStmt title

Attributes: global

Example:

<seriesStmt>
<title>Machine-Readable Texts for the Study of Indian Literature</title>
<respStmt>
<resp>ed. by</resp> <name>Jan Gonda</name>
</respStmt>
<idno type=vol>1.2</idno>
<idno type=ISSN>0 345 6789</idno>
</seriesStmt>
<notesStmt>
collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description.

May occur within: biblFull fileDesc
May contain: note

Attributes: global

<notesStmt>
<note>OCR scanning done at University of Toronto</note>
</notesStmt>
<sourceDesc>
supplies a bibliographic description of the copy text(s) from which an electronic text was derived or generated.

May occur within: biblFull fileDesc

May contain: bibl biblFull p

Attributes: global plus the following:

default : values YES | NO

Example:

<sourceDesc>
<p>No source: created in machine-readable form.</p>
</sourceDesc>
<encodingDesc>
documents the relationship between an electronic text and the source or sources from which it was derived.

May contain: projectDesc samplingDecl editorialDecl tagsDecl refsDecl classDecl


Attributes: global

<projectDesc>
May contain: p

Attributes: global plus the following:

default : values: YES | NO
<samplingDesc>

May contain: p

Attributes: global plus the following:

default: YES | NO
<editorialDesc>

May contain: p

Attributes: global plus the following:

default: YES | NO
<tagsDecl>

May contain: rendition tagUsage

Attributes: global
<tagsUsage>

May contain: #PCDATA ident code kw abbr address date name num rs time add corr del gap orig reg sic unclear emph foreign gloss hi mentioned soCalled term title ptr ref xptr xref anchor s seg gi formula eg bibl biblFull cit q label list listBibl note figure stage table text

Attributes: global plus the following:

gi
occurs
ident
render
<rendition>

May contain:#PCDATA ident code kw abbr address date name num rs time add corr del gap orig reg sic unclear emph foreign gloss hi mentioned soCalled term title ptr ref xptr xref anchor s seg gi formula eg bibl biblFull cit q label list listBibl note figure stage table text

Attributes: global
<refsDecl>
specifies how canonical references are constructed for this text.

Occurs within: encodingDesc

Contains: p

Attributes: global plus the following:

doctype : identifies the document type within which this reference declaration is used.
<classDecl>

May contain: taxonomy

Attributes: global
<taxonomy>
defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy.

May occur within: classDecl

May contain: bibl biblFull biblStruct category

Attributes: global

Example:

<taxonomy id=B>
<bibl>Brown Corpus</bibl>
<category id=B.A><catdesc>Press Reportage
<category id=B.A1><catdesc>Daily</category>
<category id=B.A2><catdesc>Sunday</category>
<category id=B.A3><catdesc>National</category>
<category id=B.A4><catdesc>Provincial</category>
<category id=B.A5><catdesc>Political</category>
<category id=B.A6><catdesc>Sports</category>
</category>
</taxonomy>
<category>

May contain: catDesc, category

Attributes: global
<catDesc>

#PCDATA ident code kw abbr address date name num rs time add corr del gap orig reg sic unclear emph foreign gloss hi mentioned soCalled term title ptr ref xptr xref anchor s seg gi formula

Attributes: global
<profileDesc>
provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.

May occur within: teiHeader

May contain: creation langUsage textClass

Attributes: global
<creation>
contains information about the creation of a text. The <creation> element may be used to record details of a text's creation, e.g. the date and place it was composed, if these are of interest; it should not be confused with the <publicationStmt> element, which records date and place of publication.

May occur within: profileDesc

May contain: #PCDATA abbr add address anchor corr date del emph foreign formula gap gi gloss hi mentioned name num orig ptr ref reg rs s seg sic soCalled term time title xptr xref

Attributes: global

Example:

<creation><date>Before 1987</date>
<creation><date value="1988-07-10">10 July 1988</date>
<langUsage>
describes the languages, sublanguages, registers, dialects etc. represented within a text. May contain either a simple prose description, or more formally one or more <language> elements

May occur within: profileDesc

May contain: language p

Attributes: global
<language>
identifies the language being described in the writing system declaration.

May occur within: langUsage

May contain: #PCDATA

Attributes: global plus the following:

iso639 : gives the standard language code from ISO 639. Value: any two- or three-letter code included included in ISO 639; if the language is not included in the list in ISO 639, the value should be given as the empty string.

<language iso639=GRC>Classical Greek</language>
<textClass>
groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc.

Attributes: global
<keywords>
contains a list of keywords or phrases identifying the topic or nature of a text.

May contain: list term

Attributes: global plus the following:

scheme : identifies the controlled vocabulary within which the set of keywords concerned is defined.

Example:

<keywords scheme=BL>
<list><item>Babbage, Charles
<item>Mathematicians - Great Britain - Biography
</list>
</keywords>
<classCode>

May contain: #PCDATA ident code kw abbr address date name num rs time add corr del gap orig reg sic unclear emph foreign gloss hi mentioned soCalled term title ptr ref xptr xref anchor s seg gi formula

Attributes: global, plus the following:

scheme IDREF #IMPLIED
<catRef>
Empty tag

Attributes: global, plus the following:

target
scheme

<revisionDesc>
summarizes the revision history for a file. Record changes with most recent changes at the top of the list.

May occur within: teiHeader

May contain: change list

Attributes: global

Example:

<revisionDesc>
<change><date>11 Nov 91</date>
<name>EB </name>
<what>Deleted chapter 10 </what>
</revisionDesc>
<change>
summarizes a particular change or correction made to a particular version of an electronic text which is shared between several researchers.

May occur within: revisionDesc
May contain: date item respStmt

Attributes: global
<respStmt>
supplies a statement of responsibility for someone responsible for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.

May occur within: bibl change editionStmt series seriesStmt titleStmt

May contain: name resp

Attributes: global

Example:

<respStmt><resp>transcribed from original ms</resp>
<name>Claus Huitfeldt</name>
</respStmt>

In addition, the TEI header includes the following tags, described in the longer list of general TEILITE tags:

<TEI.2> <author> <resp> <name> <extent> <publisher> <date> <biblFull> <title> <term>


| Back | Next |