Input

Select a DATA project directory, or select one transcript/TextGrid file and one WAV file manually.

A DATA project directory contains Text/ and Audio/ for input, and Annotation/ and Reports/ for output. Text/ may contain .txt prompts or .TextGrid annotations. You may also paste either a plain transcription or a Praat TextGrid directly. Missing subdirectories are created when possible; if several candidate files are found, you will be asked which one to load.

Audio

Process

Diagnostics

Output

Export results

Theme

Choose a higher-contrast UI colour theme. Signal-display colours are unchanged.

Theme

Soft Jade active.

Text

Symbolic text analysis for tone-number Pinyin prompts and selected TextGrid tiers. The summary below reports the current text source and parser or tier statistics.

Text source / extracted text

No prompt loaded.

Pinyin Parse / TextGrid Tier Summary

Phoneme Inventory

Allophone Rules

Allophone Text

No allophone text generated.

Finite State Phonology - Work in progress

Deterministic, epsilon-free FSA modes for actual syllables, structural Pinyin phonemic-unit generalisation, and initial–final constrained possible syllables. Generalising modes license candidates with FSAs first; gap enumeration is downstream. In TextGrid mode, input is extracted from the selected syllable-like tier.

FSA mode Tone policy Legacy source tier Show tones Show gaps

FSA gap report

Transitions

Accepted syllable sequences

Sound: Acoustic Signal Workspace

Acoustic displays, playback, zoom, AM-envelope controls, AM-derivative controls, and F0 controls. The selection and zoom range are synchronised with Annotation, RFA-FFT, and RFA-LPC.

Plain signal waveform display-normalised to -1 … 0 … 1

Normalised mono audio waveform without annotation overlays. Use this view to inspect recording quality, gross pauses, clipping, and overall signal shape before editing.

Plain AM envelope normalised to 0 … 1

Smoothed amplitude envelope derived from the mono waveform. Use the AM controls in this Sound tab to adjust envelope and derivative smoothing.

AM envelope smoothing 50 ms ≈ 8.9 Hz -3 dB

AM derivative smoothing 40 ms ≈ 11.1 Hz -3 dB

Filter type: centred moving average. The Hz helper is the approximate -3 dB cutoff; the first null is at 1 / window duration.

Plain AM envelope derivative display-normalised to -1 … 0 … 1

First derivative of the smoothed AM envelope. Positive regions indicate rapid amplitude rises used as candidate syllable-onset cues; the display shares the marked interval with the other acoustic tabs.

F0 Estimation

Extended-support autocorrelation F0 estimation with DC removal and soft low-pass preprocessing. YIN and high-pass filtering are not used. Optionally, the De Looze–Hirst heuristic estimates the F0 range from first-pass raw F0 before smoothing. The grey trace shows the moving-median filtered F0 contour; the blue trace shows the moving-mean smoothed F0 contour used for summaries and reports. If F0 modelling is applied in the Sound tab, this overview also shows dashed interpolated gap values and the regression curve.

Plain F0 contour Hz; grey = median, blue = smoothed, dashed = interpolated gaps, black = regression

F0 defaults sets fields and recomputes

Minimum F0 (Hz) Maximum F0 (Hz) Frame length (ms) Hop (ms) F0 preprocessing low-pass (Hz) Low-pass order

Use De Looze–Hirst automatic F0 range estimate (q15×0.83 / q65×1.92, after DC removal and soft low-pass) Estimate: off

F0 moving median 3 frames ≈ 30 ms

F0 moving mean 3 frames ≈ 60 ms

Default profile: high voice, F0 120–450 Hz, frame 20 ms, hop 20 ms, low-pass 450 Hz/order 3, moving median 3 and moving mean 3. Low voice defaults can be applied with the profile button. The optional De Looze–Hirst range estimate affects the internal search range only; the y-axis remains the user-set F0 range. Grey is moving-median filtered F0; blue is moving-mean smoothed F0. Move away from a changed control to apply it.

F0 interpolation and regression

Gap interpolation is a derived reconstruction layer. It does not overwrite the smoothed F0 estimate. Regression models are fitted over the selected interval, or over the visible interval when no selection is marked.

Gap interpolation Maximum gap (ms) Apply to Regression model Regression source

F0 settings are controlled here in the Sound tab; this display uses the shared time view and marked interval.

Segmentation Settings

Silence sensitivity Minimum silence ms Phrase pause ms Min syllable ms Max syllable ms Comma-final max ms Sentence-final max ms Snap window ms Punctuation influence Use prior-guided alignment Use SDR correction pass Phrase-final lengthening protection Mandarin phonotactic prior AM derivative boundary candidates SDR influence Phrase-final protection strength Onset prior strength Envelope derivative sensitivity

Alignment

Prior-guided syllable alignment. The app uses the known transcript, AM-envelope cues, Mandarin onset priors, punctuation, phrase-final lengthening protection, and SDR diagnostics. No pre-trained acoustic model is used.

SDR and boundary-strength diagnostics

SDR = duration of current syllable / duration of previous syllable. High SDR values are treated as possible boundary evidence only when supported by punctuation, following pause, and acoustic cues.

Second-pass boundary adjustments

Annotation Inspection / Editing Workspace

Inspect annotation tiers and edit generated transcription-based boundaries. TextGrid-derived annotations are inspectable but read-only.

Overlay tier

Annotation status will appear after annotation tiers are available.

TGA-style duration statistics selected annotation tier; durations in ms, rates in Hz

Descriptive timing measures are computed from the selected overlay tier. Without a marked interval, the full tier is used; with a marked interval, intervals whose midpoint falls inside the mark are used.

TGA-style duration statistics will appear after annotation tiers are available.

No annotation tier available.

Rows are computed without pauses and including pauses. The including-pauses row includes internal pauses only; initial and final pauses are excluded. The coefficient of variation is based on population standard deviation; both population and sample standard deviations are shown.

Annotated signal waveform display-normalised to -1 … 0 … 1

Normalised mono audio waveform with annotation labels, boundaries, selection, playhead, and manual boundary editing.

Smoothed AM envelope normalised to 0 … 1

Derived from frame RMS of the mono waveform, smoothed with the centred moving-average window below, and normalised to 0…1. Used for syllable-scale loudness patterns and pause/onset detection.

AM envelope derivative display-normalised to -1 … 0 … 1

First derivative of the derivative-smoothed AM envelope. Positive peaks indicate rapid amplitude rises and are used as candidate syllable-onset cues; boundaries are usually placed near the preceding envelope valley.

Annotated F0 contour Hz; dashed = interpolated gaps, black = regression model

Annotated F0 contour will appear after segmentation.

Drag across any display to mark a working interval. Click inside the marked interval to play/stop it. If annotation boundaries exist, playback snaps to the leftmost and rightmost boundaries inside the mark when at least two internal boundaries are present; otherwise it plays the marked interval exactly. Drag a segmentation boundary inside the marked interval to shift it. If no interval is marked, clicks do not play.

TGA: Time Group Analysis

TGA analyses interval durations from the selected generated or TextGrid annotation tier. Phase A provides the core TGA foundation; Phase B adds enhanced Time Group statistics and Duration Difference Tokens; Phase C adds DDT n-grams, pattern summaries, acceleration/deceleration analysis, and D-Wiggliness / D-Spaciousness duration-shape measures. Phase D adds Time Trees, Wagner Quadrant plots, and tone duration violin/box visualisations.

TGA tier

TGA implementation phases

Phase A — Core TGA: implemented: TextGrid/generated tier input, text extraction, TextGrid-to-CSV table, compact global statistics, pause-based Time Groups, JSON/CSV/ZIP export.
Phase B — Time Group analysis: implemented: Per-Time-Group statistics, Duration Difference Tokens, local duration pattern table, and simple Time Group duration display.
Phase C — Duration pattern extensions: implemented in this version: DDT n-grams, pattern summaries, extended acceleration/deceleration analysis, and D-Wiggliness / D-Spaciousness duration-shape measures.
Phase D — Structural visualisation: implemented in this version: Time Trees, Wagner Quadrant plots, and boxplots.
Phase E — Batch mode: planned: Multiple TextGrids / ZIP input and cross-file summaries.

Pause / boundary labels Minimum empty-pause duration (ms) Minimum Time Group length Local Δdur threshold (ms) DDT longer symbol DDT shorter symbol DDT equal symbol Phase C n-gram min n Phase C n-gram max n Phase C acceleration threshold (ms) D-Wiggliness threshold (ms) D-Spaciousness top-k excursions Use log-ratio D-Spaciousness

TGA runs automatically after annotation tiers are available and refreshes when relevant inputs or settings change. If a time interval is marked, TGA uses intervals whose midpoint lies inside the selected interval.

Phase A — Text extraction

Phase A extracts text from the selected tier. Phase B aligns the extracted text with pause-based Time Group IDs.

Why: this checks what linguistic material the selected interval tier contributes before any duration grouping is interpreted.

No TGA analysis yet.

Phase A — Global statistics

Original-style compact TGA table comparing no-pause, pause-only, and with-pause statistics in one table. Initial and final pauses are excluded; with-pause statistics include internal pauses only.

Why: no-pause values describe content timing, pause-only values describe silence/boundary timing, and with-pause values describe the complete delivery contour.

No TGA analysis yet.

Phase A — TextGrid-to-CSV interval table

CSV-style conversion of the selected interval tier, including timing, duration, label, pause/content flags, Time Group ID, position within group, and DDT-to-next information.

Why: this is the audit trail for TGA: every later statistic and visualisation can be traced back to these labelled intervals and durations.

No TGA analysis yet.

Phase B — Enhanced pause-based Time Groups

Time Groups are runs of content intervals separated by configured pause/boundary labels. Phase B adds per-group statistics, local Duration Difference Tokens, and group status.

Why: Time Groups approximate interpausal units, giving duration analysis a rhetorically and prosodically meaningful span.

No TGA analysis yet.

Phase B — Time Group duration display

Simple duration display for Time Groups. This is not yet a Time Tree visualisation.

Why: the bar view makes the raw duration contour visible before it is abstracted into DDT patterns, shape measures, or Time Trees.

No TGA analysis yet.

Phase C — DDT n-gram pattern summary

Global summaries of repeated Duration Difference Token n-grams using the configured Phase C n range.

Why: DDT n-grams capture repeated local lengthening, shortening, and level-duration patterns that may not be visible in aggregate statistics.

No TGA analysis yet.

Phase C — Time Group pattern details

Per-Time-Group DDT n-grams and dominant duration patterns.

Why: this shows which local duration patterns belong to each interpausal unit instead of merging all pattern evidence globally.

No TGA analysis yet.

Phase C — Acceleration / deceleration summary

Second-difference duration analysis, DDT runs, turning points, and compact profile labels.

Why: second differences identify changes in the rate of duration change, while runs and turning points summarise local rhythmic direction.

No TGA analysis yet.

Phase C — D-Wiggliness and D-Spaciousness

Duration-shape measures for no-pause content rhythm, pause-only rhetorical/dialogue fluctuation, and with-pause delivery fluctuation.

Why: D-Wiggliness measures direction-change density; D-Spaciousness measures the size of the largest duration excursions.

No TGA analysis yet.

Phase C — Duration shape by Time Group

Compact per-Time-Group view of no-pause and with-pause D-Wiggliness / D-Spaciousness plus an indicative shape profile.

Why: per-group shape values help locate where global rhythmic fluctuation is concentrated in the signal.

No TGA analysis yet.

Phase D — Time Trees

Duration-induced Time Trees for each Time Group. Nuclear-type trees prefer shorter→longer relations; Compound-type trees prefer longer→shorter relations. Parent values inherit the strongest child everywhere. Processing can use the global-best, left-to-right, or right-to-left strategy.

Why: Time Trees turn a duration sequence into an order-preserving hierarchy, making competing temporal groupings explicit. The Play/Stop controls play the continuous audio span of the selected Time Group series, including pauses.

Time Tree relation Nuclear Compound

Processing Global-best Left-to-right Right-to-left

Time Group span selector

Click one TG for a single interpausal unit, or click two TGs to select a contiguous span. Intermediate TGs are highlighted automatically. Build confirms the selected span.

Include pauses in selected-span tree

No Time Group span selected.

No TGA analysis yet.

Phase D — Time Tree similarity and robustness

Gibbon-style Tree Similarity Index and span-based robustness diagnostics comparing the six Nuclear/Compound × processing-strategy Time Trees for the current Time Group or selected span.

Ranges: Gibbon TSI and Jaccard similarity range from 0 to 1, where higher means more similar. Jaccard distance and normalised RF-like distance range from 0 to 1, where lower means more similar. Raw RF-like distance is size-dependent. Branch robustness is count/6.

Why: this tests whether the induced Time Tree is stable across relation and processing assumptions, using shared ordered spans as the comparison basis.

Root span Exclude root Include root

No TGA analysis yet.

Phase D — Wagner Quadrant plot

Z-score-normalised duration transitions using x = z(dᵢ), y = z(dᵢ₊₁). Quadrants are labelled Iambic, Pyrrhic, Spondaic, and Trochaic.

Why: adjacent z-score transitions show whether neighbouring intervals form short-long, short-short, long-long, or long-short timing relations.

No TGA analysis yet.

Phase D — Tone duration distributions

Violin plots superimposed on box plots for interval durations grouped by tones 1–5. The red dot marks the mean duration for each tone.

Why: grouping durations by tone makes possible tone–duration relations visible without assuming that tones have identical temporal distributions.

No TGA analysis yet.

RFA-FFT Rhythm Analysis: Signal and AM Envelope

RFA-FFT: rhythm formants based on non-parametric FFT spectrum of the envelope. Drag over the combined display to select the interval used for the spectrum and spectrogram.

RFA-FFT = rhythm formants based on non-parametric FFT spectrum of the envelope.

RFA-FFT signal display waveform with superimposed AM envelope

RFA-FFT interval: — Duration: — Envelope rate: —

AM envelope smoothing 50 ms ≈ 8.9 Hz -3 dB

AM derivative smoothing 40 ms ≈ 11.1 Hz -3 dB

These AM controls are synchronised with the Sound-tab AM controls. RFA-FFT uses the current committed AM envelope.

AM Low-Frequency Spectrum

FFT spectrum of the AM envelope contour in the selected interval. Peaks are labelled as candidate rhythm formants and are based on the displayed smoothed spectrum.

Minimum frequency (Hz) Maximum frequency (Hz) Window Envelope centering Spectrum smoothing Smoothing span (bins) Bidirectional spectrogram sweep

AM low-frequency spectrum selected interval only

AM Low-Frequency Spectrogram

Mandatory RFA-FFT spectrogram of the selected interval. For intervals shorter than 6 s the card remains visible and reports the required correction.

FFT window (s) Window hop (s) Spectrogram contrast

Ridge ranks

Ridge threshold Ridge smoothing (frames)

AM low-frequency spectrogram same frequency range as the spectrum

End handling: incomplete final windows are padded by copying the final envelope value to preserve the selected time scale. Bidirectional sweep, when enabled, combines forward and reversed-envelope sweeps.

RFA-LPC Rhythm Analysis: Envelope-Based LPC Model

RFA-LPC is used here as a long-window LPC/all-pole model of temporal-envelope structure. The low-frequency peaks are acoustic temporal-envelope resonance candidates, not direct neural recordings and not vocal-tract phone formants.

RFA-LPC = rhythm formants based on parametric LPC/all-pole model of the envelope.

RFA-LPC Input Envelope

Drag over the combined display to select the interval used for the RFA-LPC response and rhythm map. If no interval is marked, RFA-LPC uses the whole signal.

RFA-LPC signal display waveform with superimposed modelling envelope

RFA-LPC interval: — Duration: — Envelope rate: —

RFA-LPC uses the current committed AM envelope smoothing from the Sound-tab AM controls. This V4 tab is independent of the RFA-FFT tab; comparison is reserved for a later version.

RFA-LPC Temporal-Envelope Response

Long-window all-pole response of the selected envelope interval. Peaks are labelled as candidate temporal-envelope resonances.

Rhythm scale preset Minimum frequency (Hz) Maximum frequency (Hz) Envelope source Envelope centering Normalisation Model detail Pole rate (poles/s) LP order

Stepped 3 Hz band

Stepped bands are 3 Hz wide and advance by 2 Hz across 0.1–20 Hz. Band sweep keeps the other RFA-LPC settings unchanged and displays band-specific peaks without thresholding.

RFA-LPC temporal-envelope response selected interval only

RFA-LPC Stepped-Band Sweep Overview

Displays 3 Hz RFA-LPC bands, stepped by 2 Hz, as coloured lanes on a shared 0.1–20 Hz axis. Peak dots are band-specific and unthresholded.

RFA-LPC stepped-band sweep coloured lanes and peak labels on 0.1–20 Hz

RFA-LPC Rhythm Map

Sliding long-window RFA-LPC map of temporal-envelope response. For intervals shorter than 6 s the card remains visible and reports the required correction.

RFA-LPC window (s) Window hop (s) Map contrast

Ridge ranks

Ridge threshold Ridge smoothing (frames)

RFA-LPC rhythm map same frequency range as the response

End handling: incomplete final windows are padded by copying the final envelope value to preserve the selected time scale. The RFA-LPC interpretation is acoustic and exploratory; neural or neuromotor resonance claims require converging evidence.

TGAplus V5.9.2 User Guide

Quick Start

Open the app in a modern browser. A local web server is recommended for the modular version.
In 1. I/O, either choose a DATA Project Directory or select one prompt text file or Praat TextGrid and one WAV file manually.
Use tone-number Pinyin in a plain prompt, for example ni3 hao3, wo3 shi4 xue2 sheng1., or load an existing Praat .TextGrid as the transcription/annotation source
Check the loaded audio summary. Text analysis, annotation generation, and TGA processing update automatically when the required inputs are available.
Review symbolic analysis or TextGrid-derived text in 2. Text and acoustic evidence in 3. Sound.
Use 5. Annotation to inspect annotation tiers and edit generated transcription-based boundaries. Use 4. Sound for waveform, AM envelope, AM derivative, F0, playback and zoom controls.
Use 3. TGA for Time Group Analysis and 6. RFA-FFT / 7. RFA-LPC for rhythm-formant analysis when needed.
Return to 1. I/O to download or directly save TextGrid, JSON, CSV, or ZIP outputs.

Overview

This is a local browser-based tool for Mandarin tone-number Pinyin speech segmentation, symbolic analysis, acoustic inspection, F0 modelling, rhythm-formant analysis, manual boundary editing, and Praat TextGrid export.

The app is designed for a single prompt and a single WAV recording at a time. It uses transparent, rule-based and signal-processing methods rather than a trained ASR or forced-alignment model.

All processing is performed locally in the browser. The app has no server-side processing and does not intentionally upload audio or annotation data.

DATA Project Directory Workflow

The DATA project directory is selected by the user. It is not inferred from the folder containing the web app. When browser permissions allow it, the app ensures the following project structure exists:

DATA/
  Text/        input .txt prompt files
  Audio/       input .wav audio files
  Annotation/  direct TextGrid output
  Reports/     direct JSON/CSV output

If Text/ contains one .txt file, that prompt is loaded automatically. If it contains several .txt files, the app presents an in-app chooser. The same rule applies to .wav files in Audio/.

Direct saving writes internally generated TextGrid files to Annotation/ and JSON/CSV reports to Reports/. Imported or pasted TextGrid input is read-only for TextGrid export and is not saved back as a TextGrid. Direct saving uses numbered filenames and refuses to overwrite existing output files or loaded input files. Manual file selection remains available as a fallback or for ad-hoc work outside a DATA directory.

Using a TextGrid as Input

The transcript input can be either a plain text prompt or a Praat TextGrid. A plain text prompt uses the app’s Pinyin parser and automatic segmentation. A TextGrid supplies annotation tiers directly and bypasses automatic tier generation. Praat long text and short text TextGrid formats are supported.

The paste dialog accepts either a transcription or a Praat TextGrid. TextGrid status is detected only from standard long or short Praat TextGrid headers; all other non-empty pasted text is treated as a transcription.

In 1. I/O, choose Select Transcript / TextGrid File, place a file in DATA/Text/, or use Paste Text to paste either tone-number transcription or Praat TextGrid text.
Load the matching WAV file from DATA/Audio/ or manually.
Open 2. Text. The app selects a likely syllable or Pinyin tier by heuristic and extracts text from that tier.
Use the Text source tier selector to choose another tier if needed. This selector is synchronised with the Annotation overlay tier.
Open 5. Annotation to display the selected TextGrid tier over the waveform.

The syllable-tier heuristic uses tier-name cues such as syll, pinyin, and PY, plus non-silence interval rate, median interval duration, and label patterns. The 3–6 Hz interval-rate band is treated as a strong syllable cue, not as an absolute rule.

Export rule: TextGrid export applies only to annotations generated internally from a plain prompt/transcription, including pasted transcription. Imported and pasted TextGrid data remain available for display, tier selection, FSA/TGA analysis, JSON/CSV reports, and TGA exports, but they are not rewritten or exported as TextGrid files.

Tab 1: I/O

The I/O tab contains the main project controls.

Input: select a DATA project directory, choose a prompt/TextGrid and WAV manually, or paste transcription/TextGrid text.
Audio: inspect decoded audio metadata such as duration, sample rate, and channel handling.
Process: review automatic processing status, regenerate available results if needed, clear inputs/results, and review diagnostics.
Output: download generated TextGrid, JSON, CSV, or ZIP outputs, or save directly to the DATA directory. Imported or pasted TextGrid input is analysed as input and is not re-exported as TextGrid.
Theme: choose the interface colour theme. Signal-display colours are intentionally kept stable.

Tab 2: Text

The Text tab analyses the tone-number Pinyin prompt independently of the audio.

Pinyin parse: tokenises tone-number Pinyin, separates punctuation, and records tone labels.
Phoneme inventory: summarises the phonemic symbols generated from the Pinyin prompt.
Allophone rules: displays the rule-based conversion assumptions used for expected allophone labels.
Allophone text: shows the expected allophonic sequence and related table output.
Finite State Phonology: builds a deterministic minimised acyclic FSA over Pinyin-derived phoneme/allophone sequences. In TextGrid mode, the source is the selected syllable-like tier.

The text layer is symbolic and rule-based. It does not use a pronunciation dictionary or statistical recogniser.

Tab 4: Sound

The Sound tab is the acoustic signal workspace. It provides waveform, AM envelope, AM derivative and F0 displays, together with their acoustic-display controls, playback controls, and shared zoom/selection controls.

Plain waveform: normalised mono waveform display for inspecting recording quality, pauses, clipping, and overall signal shape.
Zoom and playback controls: one control block below the waveform and a second shared control block below the AM derivative display.
AM envelope controls and display: smoothing controls and the smoothed amplitude envelope used for syllable-scale loudness and boundary cues.
AM derivative controls and display: derivative smoothing and the first derivative of the envelope, used to reveal rapid amplitude rises and candidate onset cues.
F0 controls and display: F0 range, smoothing, interpolation, regression, presets, and the current F0 contour.
Segmentation settings: controls for silence detection, syllable duration limits, implemented Mandarin-prior/SDR options, and AM derivative boundary candidates. Some visible controls are retained for interface continuity or future refinement; V5.9.2 does not introduce new segmentation algorithms.
Alignment diagnostics: summaries of syllable duration ratios and automatic boundary adjustments.

Tab 5: Annotation

The Annotation tab is restricted to annotation inspection and, where allowed, annotation editing. Acoustic-display controls for AM envelope, AM derivative and F0 are located in the Sound tab.

Annotation status: indicates whether the current annotation is editable or read-only.
Overlay tier: choose the annotation tier used for labels, duration statistics and inspection.
Annotated waveform: waveform with selected-tier labels, boundaries, selection, playback cursor and editable boundary markers where editing is allowed.
Boundary editing: for transcription-generated annotations, select, preview, shift, snap, undo, redo and reset syllable boundaries.
TextGrid read-only mode: loaded or pasted TextGrid annotations can be inspected but not edited, rewritten or exported as TextGrid.
Playback: play the whole file or selected intervals while inspecting labels and boundaries.

Boundary editing changes generated annotation intervals only; it does not alter the original audio or imported TextGrid files.

Tab 3: TGA

The TGA tab is a phased Time Group Analysis workspace. Phase A reuses the current generated or loaded TextGrid annotation, converts the selected interval tier to CSV-style rows, extracts tier text with linebreaks at pauses, computes global no-pause, pause-only and with-pause duration statistics in a compact comparison table, and segments pause-based Time Groups.

Phase B adds enhanced per-Time-Group statistics, Duration Difference Tokens based on a local duration-difference threshold, Time Group status, and a simple duration display. Phase C adds DDT n-grams and duration-shape measures. Phase D adds Time Trees, Wagner Quadrant plots, and tone-duration violin/box plots. Later phases will add batch mode.

Exports use the numbered non-overwriting filename rule and include TGA JSON, selected-tier interval CSV, global-statistics CSV, Time Group CSV, Phase C CSVs, Phase D CSVs, and a TGA ZIP. TextGrid export is available only for TextGrids generated internally from prompt/transcription input; imported or pasted TextGrid files remain read-only input and are not re-exported.

Tab 6: RFA-FFT

RFA-FFT = rhythm formants based on non-parametric FFT spectrum of the envelope.

This tab analyses low-frequency modulation structure in the amplitude envelope using FFT-based spectrum and spectrogram displays.

RFA-FFT signal display: waveform plus AM envelope with shared interval marking and playback behaviour.
AM low-frequency spectrum: FFT spectrum of the selected or visible envelope interval.
AM low-frequency spectrogram: sliding-window view of envelope modulation strength over time and frequency.
Ridges: ranked local modulation peaks can be displayed over the spectrogram.
Peak labels: spectrum dots are labelled with rank, frequency in Hz, and period in milliseconds.

RFA-FFT is non-parametric: it shows envelope modulation components through FFT analysis rather than fitting an all-pole model.

Tab 7: RFA-LPC

RFA-LPC = rhythm formants based on parametric LPC/all-pole model of the envelope.

This tab applies long-window LPC/all-pole modelling to rhythm-scale envelope sequences. It was formerly labelled FDLP in earlier versions, but the current implementation is more accurately described as envelope-based RFA-LPC.

Envelope source: choose frame AM, squared AM, RMS amplitude, RMS power / mean-square, Hilbert amplitude, or Hilbert power envelope.
Rhythm-scale presets: choose full, phrase-scale, slow syllable / transition, syllable-scale, fast syllable / segmental, reduced syllable / segmental, or custom bands.
Model controls: set window length, hop, model detail, pole rate, LP order, centering, and normalisation.
RFA-LPC response: all-pole temporal-envelope response for the selected band.
Stepped-band inspection: step through 3 Hz bands advancing by 2 Hz up to 20 Hz.
Band sweep overview: display all stepped-band peak detections on a shared 0.1–20 Hz axis, with coloured lanes and labels.
RFA-LPC rhythm map: sliding-window map with ranked ridge overlays.

RFA-LPC peaks are acoustic temporal-envelope resonance candidates. They are not direct neural recordings; neuromotor or neural-resonance interpretations require converging evidence.

Shared Selection, Zoom, and Playback

The four acoustic tabs share one signal time view and one marked interval:

4. Sound
5. Annotation
6. RFA-FFT
7. RFA-LPC

A selection or zoom action made in any of these tabs is propagated to the others. The Info tab is documentation only and does not participate in acoustic selection or playback.

TGA-style Duration Statistics

The Annotation tab includes a horizontal TGA-style descriptive-statistics table for the currently selected annotation tier. The table is placed near the top of the Annotation tab, immediately after the title and description and before the boundary-editing controls.

If no time interval is marked, the full selected tier is analysed. If a signal interval is marked, the statistics are recomputed for intervals whose midpoint falls inside the selected interval.

Two rows are shown: without pauses and including pauses. The including-pauses row includes internal pauses only, excluding leading and trailing pauses. Durations are reported in milliseconds and rates in Hz. Both population and sample standard deviations are shown; the coefficient of variation uses population standard deviation.

The same duration-statistics object is included in the JSON report under duration_statistics.

Exports

TextGrid: Praat interval tiers for internally generated prompt/transcription annotations only, including pasted transcription. Imported or pasted TextGrid input is not re-exported as TextGrid.
JSON report: detailed metadata, settings, diagnostics, F0 summaries, RFA-FFT summaries, RFA-LPC summaries, and edit history.
CSV report: compact one-row summary for corpus tables and later aggregation.
ZIP: combined downloadable output bundle. It includes a TextGrid only when the current annotation was generated internally.
Save to DATA: direct TextGrid and report saving when a DATA project directory has been selected and write permission is available.
Numbered outputs: direct DATA saves use app-controlled non-overwriting numbered filenames, starting with _01 and searching up to _100. Browser downloads propose numbered filenames, but final collision handling also depends on the browser and operating system download behaviour.

Running and Rebuilding

Running the modular app

cd app
python -m http.server

Open the local server URL shown by Python, typically http://localhost:8000. A local server is recommended because ES modules are more reliable from http://localhost than from file://.

Running the single-file build

Open the generated HTML file in dist/. The single-file file is regenerated from the modular source; do not edit it directly.

Rebuilding the single-file version

python tools/build_singlefile.py

Terminology Notes

RFA-FFT: Rhythm formants based on non-parametric FFT spectrum of the envelope.
RFA-LPC: Rhythm formants based on parametric LPC/all-pole model of the envelope.
Phone formants: Vocal-tract resonance frequencies such as F1, F2, and F3. These are not the same as rhythm-formant peaks.
Rhythm-formant peaks: Low-frequency modulation or temporal-envelope peaks in the speech signal, interpreted as acoustic evidence for rhythm-scale organisation.

Current Limitations

The app remains a single-prompt, single-WAV tool.
The segmentation method is heuristic and transcript-constrained; it is not a trained forced aligner.
RFA-FFT and RFA-LPC comparison is not automated.
Phone-formant tracking is not yet implemented.
Batch processing, distance mapping, hierarchical clustering, and k-means analysis are reserved for later versions.
Canvas plots now have concise accessible labels and stronger keyboard focus cues, but they are not yet complete text equivalents for all visual detail.

Version Notes

V5.9.2 is a distribution cleanup release: it improves accessibility semantics for the interactive TGA duration display, clears Time Group playback status reliably after playback ends, and avoids duplicate pasted-TextGrid parse warnings.

V5.9.1 adds inline TGn playback buttons to the visible TGA duration-bar display. Clicking a TGn button plays that Time Group audio span; while audio is playing, clicking any TGn button stops playback.

V5.7.5 fixes I/O Process card Annotation status for acceptable loaded or pasted TextGrid input. Annotation is shown as generated when WAV audio is loaded, or as generated, no audio when no WAV is loaded. Processing logic is unchanged.

V5.7.4 fixed misleading I/O Process card wording for TextGrid input by removing the generic phrase “automatic generation”.

V5.7.3 removed remaining duplicated status wording from the I/O Process card, for example avoiding “waiting … [waiting]”, and clarified the destructive clear action as clearing TGAplus local memory rather than general browser settings. Processing logic is unchanged.

V5.7.2 refines the I/O Process card status lines so that each row shows source/context plus a single status badge. It distinguishes transcription file, pasted transcription, TextGrid file, and pasted TextGrid sources where relevant, and includes selected tier names where useful. Processing logic is unchanged.

V5.7.1 replaces the old Run Segmentation / Run TGA controls in the I/O Process card with automatic processing status plus three explicit actions: regenerate results, clear inputs and results, and clear inputs, results and TGAplus local memory. Annotation edits now mark TGA as stale while editing and TGA regenerates automatically when leaving the Annotation tab.

V5.7.0 adds unified pasted-text input. The paste dialog accepts either a plain transcription or a Praat TextGrid. TextGrid status is detected only from standard Praat long or short TextGrid headers; all other non-empty pasted text is treated as a transcription. Pasted transcriptions follow the normal prompt/transcription workflow and can generate TextGrid output after segmentation. Pasted TextGrids remain read-only TextGrid inputs and are not re-exported as TextGrid.

V5.6.13 aligned documentation and lightweight accessibility. V5.6.12 compacted the TextGrid/WAV information popup and increased the Text tab extracted/prompt text preview height. V5.6.11 compacted the top card of the Text tab. V5.6.10 fixed caption and legend overlap in signal and transform displays. V5.6.9 updated package/UI versioning, restricted TextGrid export to internally generated prompt-based annotations, kept imported or pasted TextGrid input out of TextGrid export paths, and wired the low-risk existing segmentation controls that already have current processing meaning. Potential algorithmic changes for currently non-functional controls such as punctuation weighting and envelope-derivative sensitivity remain deferred.

V5.5.0 adds security and packaging hardening: safer filename/status handling, sanitised SVG exports, DATA-directory side-effect warnings, large-export safeguards, privacy/security documentation, and current-only standalone packaging. V5.4.9 adds input-size warnings and shows WAV/TextGrid file sizes in the compatibility popup. V5.4.8 compacts the Finite State Phonology diagram display by visually collapsing final epsilon transitions, bundling tone alternatives on terminal transitions, and adding a Show tones toggle. V5.4.7 moves the TextGrid/WAV info button immediately after the Select DATA Project Directory button for easier access. V5.4.6 renames the web app to TGAplus. V5.4.5 moves TextGrid/WAV compatibility diagnostics into an informative popup that opens automatically when a TextGrid is loaded or pasted, and can later be reopened from the I/O tab.

V5.4.3 keeps the Run TGA button enabled on startup. V5.4.2 adjusts I/O button layout. V5.4.1 adds TGA explanatory notes and metric ranges. V5.4.0 adds Time Tree similarity and robustness. V5.3.7 adds Time Tree processing strategies. V5.3.6 extends RFA spectrum peak markers and adds the TextGrid-loading OR separator. V5.3.5 fixes TGA duration PNG export. V5.3.4 adds figure export buttons. V5.3.0 implements TGA Phase D structural visualisation.

V5.2.3 reorganises I/O workflow controls, moves exports to the Output card, and adds auto-run processing. V5.2.2 adds TGA D-Wiggliness and D-Spaciousness. V5.2.0 implements TGA Phase C. V5.1.4 compacts the TGA global statistics card. V5.1.3 preserves punctuation in TGA Time Group text. V5.1.1 changes the TGA Phase B duration display to vertical bars. V5.1.0 implements TGA Phase B. V5.0.3 builds the FSA from plain transcription input or selected TextGrid tier text. V5.0.2 adds Paste TextGrid input. V5.0.0 adds the TGA tab.

V4.0.24 adds non-overwriting numbered output filenames (_01…_100) for TextGrid and report exports.

V4.0.23 corrects the TGA-style including-pauses row so it includes internal pauses only and excludes initial and final pauses.

V4.0.22 narrows the banner reveal region so tab buttons do not trigger it and places the DATA project-directory button after the manual transcript and WAV buttons.

V4.0.20 moves the TGA-style duration statistics table to the top of the Annotation tab and turns the startup banner into a top-edge reveal overlay with anti-flicker thresholds.

V4.0.19 adds TGA-style selected-tier duration statistics in the Annotation tab and includes the same measures in the JSON report.

V4.0.18 fixes selection-drag playback behaviour in Sound/Annotation and improves TextGrid input support, including Praat short text TextGrids and persistence of loaded TextGrid tiers after audio loading.

V4.0.15 revised the DATA project-directory workflow: expected folders are created when possible, and multiple input files trigger an in-app chooser.

V4.0.14 introduced compact tab labels: I/O, Text, Sound, Annotation, RFA-FFT, and RFA-LPC.

V4.0.13 standardised RFA-FFT/RFA-LPC terminology and larger graph fonts.

V4.0.12 centralised shared selection and zoom across the four acoustic tabs.

V4.0.9–V4.0.11 added RFA-LPC stepped-band inspection, sweep overview, readable peak labels, transparent label backgrounds, and dot-redraw ordering.

V4.0.8 added RFA-LPC envelope-source choices including squared AM, RMS power, Hilbert amplitude, and Hilbert power.

V4.0 introduced the envelope-LPC rhythm analysis strategy that is now labelled RFA-LPC.