Status of the DTD

During the DOBES Hannover workshop (January 12-13, 2001) the following collective decisions were made about annotation layers:


"1. There should be two obligatory tiers, a rendered text tier and a

translation tier, plus an obligatory index for the relevant signal part.


2. The data storage and processing model should be such that

a. it supports Advanced Glossing as a surface format

b. other surface formats are interpretable as a selection from and/or a

combination of AG tiers or parts of AG tiers

c. it should be possible to deal with other annotation domains in a

compatible way"


The TIDEL team was asked to formalise the first decision in the form of a Document Type Definition.


Note that this DTD does in no way describe the annotation format that will be used as archiving format for the DOBES project. The TIDEL team intends to use the AIF (Atlas Interchange Format) format (link to LDC) or a format based on AIF, if AIF becomes available in such a form that we can also support the second Hannover decision.

Of course we will support the import into the DOBES archive of documents that conform to this DTD.


Some explanation about the DTD


An ANNOTATION_DOCUMENT consists of a HEADER and zero or more CHUNKS (we chose a neutral term for the basic unit of time alignment, since some peoplemight object to using a term like UTTERANCE).


An ANNOTATION_DOCUMENT has a number of required attributes that are needed for document housekeeping.

DATE - a date string of the form "yyyymmdd" recording the date of creation

VERSION - a sequence number recording the revision of the document

AUTHOR - the name of the document's author

FORMAT - a version number indication the version of the DTD to be used. In this case always "1.0"


The HEADER stores a number of attributes that describe the alignment with an audio or video file.

MEDIA_FILE - the name of the media file that is annotated by this document

TIME_UNITS - a string indicating how to interpret the start and end times used for media alignment. Use either "milliseconds", "PAL_frames" or "NTSC_frames".


Each CHUNK contains a RENDERED_TEXT and a TRANSLATION and is associated with a number of required attributes.

SPEAKER - identifier for the person that produces spoken words that are represented by the text. This identifier should be used consistently throughout the document.

START - start time of the CHUNK in terms of the number of TIME_UNITS counted as offset from the beginning of MEDIA_FILE

END - end time of the CHUNK