TGAplus V5.9.2 User Guide
Quick Start
- Open the app in a modern browser. A local web server is recommended for the modular version.
- In 1. I/O, either choose a DATA Project Directory or select one prompt text file or Praat TextGrid and one WAV file manually.
- Use tone-number Pinyin in a plain prompt, for example
ni3 hao3, wo3 shi4 xue2 sheng1., or load an existing Praat.TextGridas the transcription/annotation source - Check the loaded audio summary. Text analysis, annotation generation, and TGA processing update automatically when the required inputs are available.
- Review symbolic analysis or TextGrid-derived text in 2. Text and acoustic evidence in 3. Sound.
- Use 5. Annotation to inspect annotation tiers and edit generated transcription-based boundaries. Use 4. Sound for waveform, AM envelope, AM derivative, F0, playback and zoom controls.
- Use 3. TGA for Time Group Analysis and 6. RFA-FFT / 7. RFA-LPC for rhythm-formant analysis when needed.
- Return to 1. I/O to download or directly save TextGrid, JSON, CSV, or ZIP outputs.
Overview
This is a local browser-based tool for Mandarin tone-number Pinyin speech segmentation, symbolic analysis, acoustic inspection, F0 modelling, rhythm-formant analysis, manual boundary editing, and Praat TextGrid export.
The app is designed for a single prompt and a single WAV recording at a time. It uses transparent, rule-based and signal-processing methods rather than a trained ASR or forced-alignment model.
All processing is performed locally in the browser. The app has no server-side processing and does not intentionally upload audio or annotation data.
DATA Project Directory Workflow
The DATA project directory is selected by the user. It is not inferred from the folder containing the web app. When browser permissions allow it, the app ensures the following project structure exists:
DATA/
Text/ input .txt prompt files
Audio/ input .wav audio files
Annotation/ direct TextGrid output
Reports/ direct JSON/CSV output
If Text/ contains one .txt file, that prompt is loaded automatically. If it contains several .txt files, the app presents an in-app chooser. The same rule applies to .wav files in Audio/.
Direct saving writes internally generated TextGrid files to Annotation/ and JSON/CSV reports to Reports/. Imported or pasted TextGrid input is read-only for TextGrid export and is not saved back as a TextGrid. Direct saving uses numbered filenames and refuses to overwrite existing output files or loaded input files. Manual file selection remains available as a fallback or for ad-hoc work outside a DATA directory.
Using a TextGrid as Input
The transcript input can be either a plain text prompt or a Praat TextGrid. A plain text prompt uses the app’s Pinyin parser and automatic segmentation. A TextGrid supplies annotation tiers directly and bypasses automatic tier generation. Praat long text and short text TextGrid formats are supported.
The paste dialog accepts either a transcription or a Praat TextGrid. TextGrid status is detected only from standard long or short Praat TextGrid headers; all other non-empty pasted text is treated as a transcription.
- In 1. I/O, choose Select Transcript / TextGrid File, place a file in
DATA/Text/, or use Paste Text to paste either tone-number transcription or Praat TextGrid text. - Load the matching WAV file from
DATA/Audio/or manually. - Open 2. Text. The app selects a likely syllable or Pinyin tier by heuristic and extracts text from that tier.
- Use the Text source tier selector to choose another tier if needed. This selector is synchronised with the Annotation overlay tier.
- Open 5. Annotation to display the selected TextGrid tier over the waveform.
The syllable-tier heuristic uses tier-name cues such as syll, pinyin, and PY, plus non-silence interval rate, median interval duration, and label patterns. The 3–6 Hz interval-rate band is treated as a strong syllable cue, not as an absolute rule.
Export rule: TextGrid export applies only to annotations generated internally from a plain prompt/transcription, including pasted transcription. Imported and pasted TextGrid data remain available for display, tier selection, FSA/TGA analysis, JSON/CSV reports, and TGA exports, but they are not rewritten or exported as TextGrid files.
Tab 1: I/O
The I/O tab contains the main project controls.
- Input: select a DATA project directory, choose a prompt/TextGrid and WAV manually, or paste transcription/TextGrid text.
- Audio: inspect decoded audio metadata such as duration, sample rate, and channel handling.
- Process: review automatic processing status, regenerate available results if needed, clear inputs/results, and review diagnostics.
- Output: download generated TextGrid, JSON, CSV, or ZIP outputs, or save directly to the DATA directory. Imported or pasted TextGrid input is analysed as input and is not re-exported as TextGrid.
- Theme: choose the interface colour theme. Signal-display colours are intentionally kept stable.
Tab 2: Text
The Text tab analyses the tone-number Pinyin prompt independently of the audio.
- Pinyin parse: tokenises tone-number Pinyin, separates punctuation, and records tone labels.
- Phoneme inventory: summarises the phonemic symbols generated from the Pinyin prompt.
- Allophone rules: displays the rule-based conversion assumptions used for expected allophone labels.
- Allophone text: shows the expected allophonic sequence and related table output.
- Finite State Phonology: builds a deterministic minimised acyclic FSA over Pinyin-derived phoneme/allophone sequences. In TextGrid mode, the source is the selected syllable-like tier.
The text layer is symbolic and rule-based. It does not use a pronunciation dictionary or statistical recogniser.
Tab 4: Sound
The Sound tab is the acoustic signal workspace. It provides waveform, AM envelope, AM derivative and F0 displays, together with their acoustic-display controls, playback controls, and shared zoom/selection controls.
- Plain waveform: normalised mono waveform display for inspecting recording quality, pauses, clipping, and overall signal shape.
- Zoom and playback controls: one control block below the waveform and a second shared control block below the AM derivative display.
- AM envelope controls and display: smoothing controls and the smoothed amplitude envelope used for syllable-scale loudness and boundary cues.
- AM derivative controls and display: derivative smoothing and the first derivative of the envelope, used to reveal rapid amplitude rises and candidate onset cues.
- F0 controls and display: F0 range, smoothing, interpolation, regression, presets, and the current F0 contour.
- Segmentation settings: controls for silence detection, syllable duration limits, implemented Mandarin-prior/SDR options, and AM derivative boundary candidates. Some visible controls are retained for interface continuity or future refinement; V5.9.2 does not introduce new segmentation algorithms.
- Alignment diagnostics: summaries of syllable duration ratios and automatic boundary adjustments.
Tab 5: Annotation
The Annotation tab is restricted to annotation inspection and, where allowed, annotation editing. Acoustic-display controls for AM envelope, AM derivative and F0 are located in the Sound tab.
- Annotation status: indicates whether the current annotation is editable or read-only.
- Overlay tier: choose the annotation tier used for labels, duration statistics and inspection.
- Annotated waveform: waveform with selected-tier labels, boundaries, selection, playback cursor and editable boundary markers where editing is allowed.
- Boundary editing: for transcription-generated annotations, select, preview, shift, snap, undo, redo and reset syllable boundaries.
- TextGrid read-only mode: loaded or pasted TextGrid annotations can be inspected but not edited, rewritten or exported as TextGrid.
- Playback: play the whole file or selected intervals while inspecting labels and boundaries.
Boundary editing changes generated annotation intervals only; it does not alter the original audio or imported TextGrid files.
Tab 3: TGA
The TGA tab is a phased Time Group Analysis workspace. Phase A reuses the current generated or loaded TextGrid annotation, converts the selected interval tier to CSV-style rows, extracts tier text with linebreaks at pauses, computes global no-pause, pause-only and with-pause duration statistics in a compact comparison table, and segments pause-based Time Groups.
Phase B adds enhanced per-Time-Group statistics, Duration Difference Tokens based on a local duration-difference threshold, Time Group status, and a simple duration display. Phase C adds DDT n-grams and duration-shape measures. Phase D adds Time Trees, Wagner Quadrant plots, and tone-duration violin/box plots. Later phases will add batch mode.
Exports use the numbered non-overwriting filename rule and include TGA JSON, selected-tier interval CSV, global-statistics CSV, Time Group CSV, Phase C CSVs, Phase D CSVs, and a TGA ZIP. TextGrid export is available only for TextGrids generated internally from prompt/transcription input; imported or pasted TextGrid files remain read-only input and are not re-exported.
Tab 6: RFA-FFT
RFA-FFT = rhythm formants based on non-parametric FFT spectrum of the envelope.
This tab analyses low-frequency modulation structure in the amplitude envelope using FFT-based spectrum and spectrogram displays.
- RFA-FFT signal display: waveform plus AM envelope with shared interval marking and playback behaviour.
- AM low-frequency spectrum: FFT spectrum of the selected or visible envelope interval.
- AM low-frequency spectrogram: sliding-window view of envelope modulation strength over time and frequency.
- Ridges: ranked local modulation peaks can be displayed over the spectrogram.
- Peak labels: spectrum dots are labelled with rank, frequency in Hz, and period in milliseconds.
RFA-FFT is non-parametric: it shows envelope modulation components through FFT analysis rather than fitting an all-pole model.
Tab 7: RFA-LPC
RFA-LPC = rhythm formants based on parametric LPC/all-pole model of the envelope.
This tab applies long-window LPC/all-pole modelling to rhythm-scale envelope sequences. It was formerly labelled FDLP in earlier versions, but the current implementation is more accurately described as envelope-based RFA-LPC.
- Envelope source: choose frame AM, squared AM, RMS amplitude, RMS power / mean-square, Hilbert amplitude, or Hilbert power envelope.
- Rhythm-scale presets: choose full, phrase-scale, slow syllable / transition, syllable-scale, fast syllable / segmental, reduced syllable / segmental, or custom bands.
- Model controls: set window length, hop, model detail, pole rate, LP order, centering, and normalisation.
- RFA-LPC response: all-pole temporal-envelope response for the selected band.
- Stepped-band inspection: step through 3 Hz bands advancing by 2 Hz up to 20 Hz.
- Band sweep overview: display all stepped-band peak detections on a shared 0.1–20 Hz axis, with coloured lanes and labels.
- RFA-LPC rhythm map: sliding-window map with ranked ridge overlays.
RFA-LPC peaks are acoustic temporal-envelope resonance candidates. They are not direct neural recordings; neuromotor or neural-resonance interpretations require converging evidence.
TGA-style Duration Statistics
The Annotation tab includes a horizontal TGA-style descriptive-statistics table for the currently selected annotation tier. The table is placed near the top of the Annotation tab, immediately after the title and description and before the boundary-editing controls.
If no time interval is marked, the full selected tier is analysed. If a signal interval is marked, the statistics are recomputed for intervals whose midpoint falls inside the selected interval.
Two rows are shown: without pauses and including pauses. The including-pauses row includes internal pauses only, excluding leading and trailing pauses. Durations are reported in milliseconds and rates in Hz. Both population and sample standard deviations are shown; the coefficient of variation uses population standard deviation.
The same duration-statistics object is included in the JSON report under duration_statistics.
Exports
- TextGrid: Praat interval tiers for internally generated prompt/transcription annotations only, including pasted transcription. Imported or pasted TextGrid input is not re-exported as TextGrid.
- JSON report: detailed metadata, settings, diagnostics, F0 summaries, RFA-FFT summaries, RFA-LPC summaries, and edit history.
- CSV report: compact one-row summary for corpus tables and later aggregation.
- ZIP: combined downloadable output bundle. It includes a TextGrid only when the current annotation was generated internally.
- Save to DATA: direct TextGrid and report saving when a DATA project directory has been selected and write permission is available.
- Numbered outputs: direct DATA saves use app-controlled non-overwriting numbered filenames, starting with
_01and searching up to_100. Browser downloads propose numbered filenames, but final collision handling also depends on the browser and operating system download behaviour.
Running and Rebuilding
Running the modular app
cd app
python -m http.server
Open the local server URL shown by Python, typically http://localhost:8000. A local server is recommended because ES modules are more reliable from http://localhost than from file://.
Running the single-file build
Open the generated HTML file in dist/. The single-file file is regenerated from the modular source; do not edit it directly.
Rebuilding the single-file version
python tools/build_singlefile.py
Terminology Notes
- RFA-FFT
- Rhythm formants based on non-parametric FFT spectrum of the envelope.
- RFA-LPC
- Rhythm formants based on parametric LPC/all-pole model of the envelope.
- Phone formants
- Vocal-tract resonance frequencies such as F1, F2, and F3. These are not the same as rhythm-formant peaks.
- Rhythm-formant peaks
- Low-frequency modulation or temporal-envelope peaks in the speech signal, interpreted as acoustic evidence for rhythm-scale organisation.
Current Limitations
- The app remains a single-prompt, single-WAV tool.
- The segmentation method is heuristic and transcript-constrained; it is not a trained forced aligner.
- RFA-FFT and RFA-LPC comparison is not automated.
- Phone-formant tracking is not yet implemented.
- Batch processing, distance mapping, hierarchical clustering, and k-means analysis are reserved for later versions.
- Canvas plots now have concise accessible labels and stronger keyboard focus cues, but they are not yet complete text equivalents for all visual detail.
Version Notes
V5.9.2 is a distribution cleanup release: it improves accessibility semantics for the interactive TGA duration display, clears Time Group playback status reliably after playback ends, and avoids duplicate pasted-TextGrid parse warnings.
V5.9.1 adds inline TGn playback buttons to the visible TGA duration-bar display. Clicking a TGn button plays that Time Group audio span; while audio is playing, clicking any TGn button stops playback.
V5.7.5 fixes I/O Process card Annotation status for acceptable loaded or pasted TextGrid input. Annotation is shown as generated when WAV audio is loaded, or as generated, no audio when no WAV is loaded. Processing logic is unchanged.
V5.7.4 fixed misleading I/O Process card wording for TextGrid input by removing the generic phrase “automatic generation”.
V5.7.3 removed remaining duplicated status wording from the I/O Process card, for example avoiding “waiting … [waiting]”, and clarified the destructive clear action as clearing TGAplus local memory rather than general browser settings. Processing logic is unchanged.
V5.7.2 refines the I/O Process card status lines so that each row shows source/context plus a single status badge. It distinguishes transcription file, pasted transcription, TextGrid file, and pasted TextGrid sources where relevant, and includes selected tier names where useful. Processing logic is unchanged.
V5.7.1 replaces the old Run Segmentation / Run TGA controls in the I/O Process card with automatic processing status plus three explicit actions: regenerate results, clear inputs and results, and clear inputs, results and TGAplus local memory. Annotation edits now mark TGA as stale while editing and TGA regenerates automatically when leaving the Annotation tab.
V5.7.0 adds unified pasted-text input. The paste dialog accepts either a plain transcription or a Praat TextGrid. TextGrid status is detected only from standard Praat long or short TextGrid headers; all other non-empty pasted text is treated as a transcription. Pasted transcriptions follow the normal prompt/transcription workflow and can generate TextGrid output after segmentation. Pasted TextGrids remain read-only TextGrid inputs and are not re-exported as TextGrid.
V5.6.13 aligned documentation and lightweight accessibility. V5.6.12 compacted the TextGrid/WAV information popup and increased the Text tab extracted/prompt text preview height. V5.6.11 compacted the top card of the Text tab. V5.6.10 fixed caption and legend overlap in signal and transform displays. V5.6.9 updated package/UI versioning, restricted TextGrid export to internally generated prompt-based annotations, kept imported or pasted TextGrid input out of TextGrid export paths, and wired the low-risk existing segmentation controls that already have current processing meaning. Potential algorithmic changes for currently non-functional controls such as punctuation weighting and envelope-derivative sensitivity remain deferred.
V5.5.0 adds security and packaging hardening: safer filename/status handling, sanitised SVG exports, DATA-directory side-effect warnings, large-export safeguards, privacy/security documentation, and current-only standalone packaging. V5.4.9 adds input-size warnings and shows WAV/TextGrid file sizes in the compatibility popup. V5.4.8 compacts the Finite State Phonology diagram display by visually collapsing final epsilon transitions, bundling tone alternatives on terminal transitions, and adding a Show tones toggle. V5.4.7 moves the TextGrid/WAV info button immediately after the Select DATA Project Directory button for easier access. V5.4.6 renames the web app to TGAplus. V5.4.5 moves TextGrid/WAV compatibility diagnostics into an informative popup that opens automatically when a TextGrid is loaded or pasted, and can later be reopened from the I/O tab.
V5.4.3 keeps the Run TGA button enabled on startup. V5.4.2 adjusts I/O button layout. V5.4.1 adds TGA explanatory notes and metric ranges. V5.4.0 adds Time Tree similarity and robustness. V5.3.7 adds Time Tree processing strategies. V5.3.6 extends RFA spectrum peak markers and adds the TextGrid-loading OR separator. V5.3.5 fixes TGA duration PNG export. V5.3.4 adds figure export buttons. V5.3.0 implements TGA Phase D structural visualisation.
V5.2.3 reorganises I/O workflow controls, moves exports to the Output card, and adds auto-run processing. V5.2.2 adds TGA D-Wiggliness and D-Spaciousness. V5.2.0 implements TGA Phase C. V5.1.4 compacts the TGA global statistics card. V5.1.3 preserves punctuation in TGA Time Group text. V5.1.1 changes the TGA Phase B duration display to vertical bars. V5.1.0 implements TGA Phase B. V5.0.3 builds the FSA from plain transcription input or selected TextGrid tier text. V5.0.2 adds Paste TextGrid input. V5.0.0 adds the TGA tab.
V4.0.24 adds non-overwriting numbered output filenames (_01…_100) for TextGrid and report exports.
V4.0.23 corrects the TGA-style including-pauses row so it includes internal pauses only and excludes initial and final pauses.
V4.0.22 narrows the banner reveal region so tab buttons do not trigger it and places the DATA project-directory button after the manual transcript and WAV buttons.
V4.0.20 moves the TGA-style duration statistics table to the top of the Annotation tab and turns the startup banner into a top-edge reveal overlay with anti-flicker thresholds.
V4.0.19 adds TGA-style selected-tier duration statistics in the Annotation tab and includes the same measures in the JSON report.
V4.0.18 fixes selection-drag playback behaviour in Sound/Annotation and improves TextGrid input support, including Praat short text TextGrids and persistence of loaded TextGrid tiers after audio loading.
V4.0.15 revised the DATA project-directory workflow: expected folders are created when possible, and multiple input files trigger an in-app chooser.
V4.0.14 introduced compact tab labels: I/O, Text, Sound, Annotation, RFA-FFT, and RFA-LPC.
V4.0.13 standardised RFA-FFT/RFA-LPC terminology and larger graph fonts.
V4.0.12 centralised shared selection and zoom across the four acoustic tabs.
V4.0.9–V4.0.11 added RFA-LPC stepped-band inspection, sweep overview, readable peak labels, transparent label backgrounds, and dot-redraw ordering.
V4.0.8 added RFA-LPC envelope-source choices including squared AM, RMS power, Hilbert amplitude, and Hilbert power.
V4.0 introduced the envelope-LPC rhythm analysis strategy that is now labelled RFA-LPC.