Format conversion tools for Praat, transcriber and TASX Annotator
D. Gibbon, Updated September 2002
The audio signal annotation tools Praat and Transcriber are useful for different purposes, but sometimes - particularly for archiving - these functions overlap. This makes it necessary to convert one format to another in order to have a standardised archive format.
The formats have different properties and permit different types and quantities of information to be represented. Praat permits arbitrary numbers of annotation tiers to be included, for instance, while Transcriber only permits two. Transcriber permits a header with metadata to be included, while Praat does not. Consequently, conversion can be lossy, and multiple conversions backwards and forwards will lead to a representation with the intersection of the information represented by Praat and Transcriber.
Because of the lossiness of direct conversions, a more flexible format is required into which both can be converted back and forth without loss. The TASX format in XML developed by Jan-Torsten Milde and Ulrike Gut was designed for this purpose. The TASX format is both a generic XML annotation format and the specific format used by the TASX Annotator video and audio signal annotation software; it is easily converted into other XML formats using XML transformation tools.
The following tools were developed in order to interconvert Praat, Transcriber and TASX annotation files. They should be used bearing the above remarks about conversion limitations in mind. The tools were developed by different authors in Perl, for ad hoc applications and are not always fully structured and documented for distribution purposes. Consequently, although they can most likely be used as they stand, we recommend that they be regarded as a source of information and ideas, and preferably re-implemented for various platforms and uses. The Perl scripts were designed to be run on UNIX/Linux machines, and with the usual minor modifications should also run on other platforms.
Sample Praat file.
Sample Transcriber file.
Format converter, reads Transcriber files and creates Praat TextGrid files, sending them to STDOUT:
USAGE: trans2praat.pl INFILE > OUTFILE
Format converter, creating transcriber files from a praat file (note: possibly a lossy conversion, works on single tiers only). Reads Praat TextGrid files and creates Transcriber files, sending them to STDOUT
USAGE: praat2tr.pl TIERNUMBER INFILE > OUTFILE
Generates a SAMPA or IPA tier in a Praat file using an existing IPA or SAMPA tier (copying or creation of a new tier):
USAGE: generatepraattier.pl TIERNUMBER SOURCEFORMAT INFILE OUTFILE
TIERNUMBER is the source tier,
SOURCEFORMAT is either S (SAMPA) or P for (Praat) IPA;
INFILE is a Praat TextGridfile, OUTFILE is also a PRAATFILE
Praat font converter, using Praat-IPA symbols as input, reads from input file, and produces corresponding SAMPA-IPA-notation as ASCII combinations, sending them to STDOUT:
USAGE: praat2sampa.pl INFILE > OUTFILE
Praat font converter, using SAMPA-IPA symbols as input, read from input file, and produces corresponding PRAAT-IPA-notation as ASCII combinations, sending them to STDOUT:
USAGE: sampa2praat.pl INFILE > OUTFILE
Generating TASX-formated annotations in INFILE from PraatTextGrid files to STDOUT.
USAGE: praat2tasx.pl INFILE > OUTFILE
Converts TASX formated files into Praat TextGrid Files; reading INFILE, sending to STDOUT
USAGE: tasx2textgrid.pl INFILE > OUTFILE
Generates TASX-formated annotations from Praat TextGrid files but shifts offsets timestamps in the whole file by a fixed time value. Used for example to align Praat annotated audio annotations to video recordings for further annotation.
USAGE: praat2tasx.pl INFILE OUTFILE TIMESHIFT