Format conversion tools for Praat, transcriber and TASX Annotator

D. Gibbon, Updated September 2002



Description

The audio signal annotation tools Praat and Transcriber are useful for different purposes, but sometimes - particularly for archiving - these functions overlap. This makes it necessary to convert one format to another in order to have a standardised archive format.

The formats have different properties and permit different types and quantities of information to be represented. Praat permits arbitrary numbers of annotation tiers to be included, for instance, while Transcriber only permits two. Transcriber permits a header with metadata to be included, while Praat does not. Consequently, conversion can be lossy, and multiple conversions backwards and forwards will lead to a representation with the intersection of the information represented by Praat and Transcriber.

Because of the lossiness of direct conversions, a more flexible format is required into which both can be converted back and forth without loss. The TASX format in XML developed by Jan-Torsten Milde and Ulrike Gut was designed for this purpose. The TASX format is both a generic XML annotation format and the specific format used by the TASX Annotator video and audio signal annotation software; it is easily converted into other XML formats using XML transformation tools.

The following tools were developed in order to interconvert Praat, Transcriber and TASX annotation files. They should be used bearing the above remarks about conversion limitations in mind. The tools were developed by different authors in Perl, for ad hoc applications and are not always fully structured and documented for distribution purposes. Consequently, although they can most likely be used as they stand, we recommend that they be regarded as a source of information and ideas, and preferably re-implemented for various platforms and uses. The Perl scripts were designed to be run on UNIX/Linux machines, and with the usual minor modifications should also run on other platforms.



Sample Praat and Transcriber files

cyprien_greeting1.TextGrid

Sample Praat file.



cyprien_greeting1.trs

Sample Transcriber file.



Conversion between Praat and Transcriber annotation formats

trans2praat.pl

Format converter, reads Transcriber files and creates Praat TextGrid files, sending them to STDOUT:

USAGE: trans2praat.pl INFILE > OUTFILE



praat2tr.pl

Format converter, creating transcriber files from a praat file (note: possibly a lossy conversion, works on single tiers only). Reads Praat TextGrid files and creates Transcriber files, sending them to STDOUT

USAGE: praat2tr.pl TIERNUMBER INFILE > OUTFILE



Converting and inserting SAMPA and Praat IPA transcription tiers automatically

generatepraattier.pl

Generates a SAMPA or IPA tier in a Praat file using an existing IPA or SAMPA tier (copying or creation of a new tier):

USAGE: generatepraattier.pl TIERNUMBER SOURCEFORMAT INFILE OUTFILE

where

TIERNUMBER is the source tier,

SOURCEFORMAT is either S (SAMPA) or P for (Praat) IPA;

INFILE is a Praat TextGridfile, OUTFILE is also a PRAATFILE



Praat font conversions

praat2sampa.pl

Praat font converter, using Praat-IPA symbols as input, reads from input file, and produces corresponding SAMPA-IPA-notation as ASCII combinations, sending them to STDOUT:

USAGE: praat2sampa.pl INFILE > OUTFILE



sampa2praat.pl

Praat font converter, using SAMPA-IPA symbols as input, read from input file, and produces corresponding PRAAT-IPA-notation as ASCII combinations, sending them to STDOUT:

USAGE: sampa2praat.pl INFILE > OUTFILE



Praat / TASX conversion

praat2tasx.pl

Generating TASX-formated annotations in INFILE from PraatTextGrid files to STDOUT.

USAGE: praat2tasx.pl INFILE > OUTFILE



tasx2textgrid.pl

Converts TASX formated files into Praat TextGrid Files; reading INFILE, sending to STDOUT

USAGE: tasx2textgrid.pl INFILE > OUTFILE



praat2tasxalign.pl

Generates TASX-formated annotations from Praat TextGrid files but shifts offsets timestamps in the whole file by a fixed time value. Used for example to align Praat annotated audio annotations to video recordings for further annotation.

USAGE: praat2tasx.pl INFILE OUTFILE TIMESHIFT