TGA: Time Group Analyzer

An online tool for speech annotation mining

Dafydd Gibbon (Universität Bielefeld)

CITATION: In publications which use the online TGA tool, please cite:

Gibbon, Dafydd and Jue Yu. 2015. "Time Group Analyzer: Methodology And Implementation." The Phonetician 111/112:9-34.

See below for a list of papers for which the TGA tool has been used.

Python powered

  1. To use the TGA online annotation mining tool, proceed to one of the demos, and replace the demo annotation with your own. Then adjust the parameters for the kind of analysis you are looking for.
  2. The TGA application has now been re-designed as a multi-user system.
  3. However, server space is limited, and therefore graphics files which are older than a certain time (initially set at 2 min) are removed when TGA is re-run, either by yourself or by another user.
  4. Consequently, it can happen that someone else may unintentionally delete your graphics files. This will only affect you when you need to download the files.
  5. If you have created graphics files snd need to download them but the file has been removed because someone else has re-run the TGA, then simply reload with the browser.
  6. If there are many users, this limit may cause problems. If you experience problems with this policy, please let me know and I will temporarily increase the wait time before cleanup.
  7. Graphics files have the format: "TGA_PID_*.png" (PID is the process ID).

Note: If the selected tier has fewer than 16 different label types (e.g. a tone tier), then a box-and-whisker plot is automatically drawn.
TGA output types DDT and DB
The DDT (Duration Difference Token) display shows a sequence of symbols '/' (for short-long duration pairs), '\' (for long-short duration pairs), '=' (equal duration pairs), depending on the user-defined threshold for local duration differences. The motivation for the DDT representation is to illustrate the directionality (positive or negative) of duration differences between adjacent items, which is not captured by timing measures such as standard deviation or nPVI. In addition to showing individual DDT patterns, the TGA provides statistics over DDT n-gram sequences, which provide information about binary, ternary etc. rhythm types.
The DB (Duration Bar) display shows the durations of the items in the annotation labels, e.g. syllables, both in ms and as bars whose width and length (scaled differently) show the durations directly.

TGA graph output type BoxPlot
The BoxPlot graph output type applies only to annotation tiers with fewer than 16 label types, for example tones, accents, phoneme major class (C, V, G, L etc.) annotations, for comparing duration properties of values of these categories. The output for each value contains the following graphical information:
  1. Error bar (left).
  2. Box plot (centre) with horizontal 1st, 2nd (median, red bar) and 3rd quartile bars, with outliers above the 4th quartile bar. The mean is indicated by a red dot.
  3. Vertical plot (yellow) roughly indicating the distribution of values.

Box plot

TGA graph output type Wagner Quadrant Graphs (WQ graphs)
The Wagner Quadrant graph (WQ graph) is a scatter plot which displays the relation between durations of adjacent chunks of speech, e.g. syllables. The WQ graphs were developed by P. Wagner to provide information about rhythm types by illustrating and quantifying the directionality (positive or negative) of duration differences between adjacent items, which is not captured by measures such as standard deviation or nPVI.
The Wagner Quadrant graphs in the animations below were generated directly by the TGA, but the animations were created offline from the TGA outputs.
Note for English in each case the clustering of dots in the bottom left quadrant, contrasting with the relatively random distribution for Mandarin, Tem and the poor Mandarin L2 speaker.
Syllable duration typology (raw durations): English - Tem - Mandarin Syllable duration typology (unsigned durations): English - Tem - Mandarin Syllable duration in L2 learning: poor L2, advanced L2 - native US Syllable duration in English genres (Aix-MARSEC genres A-G)

Papers using TGA annotation mining methodology for analysis of rhythm, duration sequences and other timing patterns (currently Mandarin Chinese; L1 and L2 English; Polish)
  1. Yu, Jue and Gibbon, Dafydd, Criteria for database and tool design for speech timing analysis with special reference to Mandarin, Oriental COCOSDA 2012 (cf. IEEEexplore Conf ID 21048)
  2. Gibbon, Dafydd, TGA: a web tool for Time Group Analysis, TRASP 2013 (poster)
  3. Yu, Jue, Timing analysis with the help of SPPAS and TGA tools, TRASP 2013 (poster)
  4. Klessa, Katarzyna, Maciej Karpinski and Agnieszka Wagner, Annotation Pro: a new software tool for annotation of linguistic and paralinguistic features TRASP 2013
  5. Klessa, Katarzyna and Dafydd Gibbon, Annotation Pro+TGA: automation of speech timing analysis, LREC 2013.
  6. Yu, Jue, Dafydd Gibbon and Katarzyna Klessa, Computational annotation-mining of syllable durations in speech varieties, Speech Prosody 7, 2014.
  7. Gibbon, Dafydd, Katarzyna Klessa and Jolanta Bachan, Duration and speed of speech events: A selection of methods>. Lingua Posnaniensia, Volume 56, Issue 1 (Jun 2014). Studies in Phonetics and Psycholinguistics. Special issue dedicated to Professor Piotra Łobacz, Issue Editors: Maciej Karpiński, Nawoja Mikołajczak-Matyja. 59-83. 2014.
  8. Yu, Jue and Dafydd Gibbon, How natural is Chinese L2 English? ICPhS, Glasgow, 2015.
  9. Yu, Jue and Dafydd Gibbon, Time Group Types in Mandarin Syllable Annotations, O-COCOSDA, Shanghai, 2015.
  10. Gibbon, Dafydd and Jue Yu. Time Group Analyzer: Methodology And Implement ation, The Phonetician 111/112:9-34. 2015.

  1. The duration visualisation graphics are not rendered correctly by Firefox, which has a bug in its HTML rendering. The vertical bars are incorrectly shown as circles. However, the information displayed by the circles is the same.
  2. This tool will DEFINITELY NOT work with most TextGrid files, because the tool is designed to handle ONLY one particular small set of ASCII symbols and (obviously) only interval tiers are handled.
  3. Within the above constraints, long or short TextGrid Interval Tier formats are handled. The tool was designed for syllable tiers, but in principle any tier can be handled. You will be lucky not to get arbitrary error output if you try anything else. Response time depends on TextGrid tier length and (for deceleration and acceleration) global threshold range. Be patient!
Automatic recognition heuristic for input data formats (examines only first line, not foolproof)
  1. Praat TextGrid full and short formats (specified tier picked out of arbitrarily many tiers)
  2. Single-tier CSV table (do not mix separators in the same data set):
      row := label sep starttime sep endtime [ sep duration ]
      sep := TAB | SP | "," | ";" | ":"
  3. Timestamp values in all formats are in seconds with dot decimal point (not milliseconds, and not comma), following Praat TextGrid conventions.


