Non-Deterministic Finite State Transducers (NDFST)

Dafydd Gibbon, U Bielefeld, 2012-09-13

General intro

An NDFST is a variety of Finite State Transducer, FST, and defines a kind of translation between two codes. Check this example, which translates very restricted English expressions into German and vice versa. The demo shows how a single input string can have many translations. The FST can be represented as a network:

Skip to the Practical Stuff section if you're not really interested in Formal stuff.

Formal Stuff

A Finite State Transducer, FST (also known as a Finite State Machine or State Machine) defines a relation between two sets of strings of symbols, an input set and an output set (whose roles can be reversed). The sets of strings must be regular sets, i.e. the strings must be composed by concatenating to the left or to the right (but not both), and not by insertion into the middle. The sets may be infinite, as in {big, very big, very very big, ...}. The relation between regular sets is a regular relation.
The FST is defined by specifying (1) a set of states; (2) the start state; (3) a subset of states which are possible end states; (4) a set of transitions between states representing (5) a relation between input and output symbols (see the Figure).
A Deterministic Finite State Transducer, DFST, is a variety of FST where for each input symbol at a given state there is only one path leading to another state.
A Non-Deterministic Finite State Transducer, NDFST or NFST is a variety of FST where for each input symbol at a given state there may be more than one path leading to another state, needing additional criteria for deciding which path to take.
An interpreter for an FST processes an FST and translates input strings into output strings. An FST interpreter requires only a finite memory, which is a function of the number of states. The interpreter developed for this workbench takes the easy way out and produces all possible translations of the input string.
If you want to know more, check Wikipedia on Finite State Transducer.

Practical stuff

Input:
The input is a string consisting of a sequence of symbols which are transduced into the output strings which the FST defines.

Note that calculation takes time if the FST is complex and the input is long - watch your browser's download indicator!
The switch settings which inflence the speed are Numbering, Graph, Compact.

Notes on switch settings:
Trace=on is normally needed only for debugging.
Numbering=on additionally displays output item count and (with arrows) transducer Direction selection.
SAMPA is only switched off for phonetic representations.
Graph=on is useful for visualisation in general (in which case it is only needed once - the graph can be saved), but also for debugging.
LR/TB defines the orientation of the graph as left-right or top-down.
Compact=yes shows the graph with sets of parallel transitions reduced to a single arbitrary member of the transition set.
Direction permits input and output to be exchanged.
Relation determines whether only the output symbols or input and output symbol pairs are to be displayed.
Autoinput is only required for some special applications.

Specimen form layout:

Input:

Switches:

Trace:	Numbering:	SAMPA:

Graph:	LR/TB:	Compact:

Direction:	Relation:	Autoinput:

FST:

# Filename: engsyll04-ndfst.html
# Data: Dafydd Gibbon
# FST: Dafydd Gibbon
# Date: 2012-08-26
# Description: English strong syllable phonotactics
# Notes:
# - Technical trick: each sound maps to 'x'; this enables sets
#   of syllables of given lengths to be generated. Thus 'x x'
#   generates the set of 2-sound syllables, up to the maximum
#   of 6: 'x x x x x x'.
# - Not included
#	- 'foreign' pronunciations: [t]sunami, [t]setse (fly)
#	- spelling pronunciations: 'knee', 'gnu', 'psych'
#	- inflections: -s, -z, -t, -d
#	- weak syllables: cf. de-(cid)-ed-ly
# - The set of existing syllables in actual words in the dictionary
#   is much smaller. The set generated here is the set of
#   potential regular syllables, which can be used, for instance,
#   in humorous rhymes or new brand names. The set is made by
#   combining all possible initial consonant clusters with all
#   possible vowel and final consonant clusters, based on
#	- existing dictionary combinations,
#	- (with vowels) in natural vowel classes.
# - To get the full set (31761) just click on SEND and be patient!
 
initial=q1
terminal=q11, q12, q13, q14, q15, q16, q17
fst=
# Onsets
#   Consonant (glottal stop or zero is '?')
	q1,?,x,q9;
	q1,p,x,q9;
	q1,b,x,q9;
	q1,t,x,q9;
	q1,d,x,q9;
	q1,k,x,q9;
	q1,g,x,q9;
	q1,f,x,q9;
	q1,v,x,q9;
	q1,T,x,q9;
	q1,D,x,q9;
	q1,tS,x,q9;
	q1,dZ,x,q9;
	q1,s,x,q9;
	q1,z,x,q9;
	q1,S,x,q9;
	q1,Z,x,q9;
	q1,h,x,q9;
	q1,m,x,q9;
	q1,n,x,q9;
	q1,l,x,q9;
	q1,r,x,q9;
	q1,j,x,q9;
	q1,w,x,q9;
#   Obstruent + Gliquid
	q1,t,x,q4;
	q1,d,x,q4;
	q1,T,x,q4;
	q1,S,x,q4;
	q1,t,x,q5;
	q1,d,x,q5;
	q1,h,x,q5;
	q1,p,x,q6;
	q1,b,x,q6;
	q1,f,x,q6;
	q1,k,x,q7;
	q1,g,x,q7;
#   s + ...
	q1,s,x,q2;
# q2
#  post-s AnteriorVoicelessCons
	q2,p,x,q9;
	q2,t,x,q9;
	q2,k,x,q9;
	q2,m,x,q9;
	q2,n,x,q9;
	q2,l,x,q9;
	q2,w,x,q9;
#   post-s VoicelessStop + Gliquid
	q2,t,x,q4;
	q2,p,x,q6;
	q2,k,x,q7;
#   post-s VoicelessStop + Gliquid
	q4,r,x,q9;
#   post-s VoicelessStop + Gliquid
	q5,w,x,q9;
#   post-s VoicelessStop + Gliquid
	q6,r,x,q9;
	q6,l,x,q9;
#   post-s VoicelessStop + Gliquid
	q7,r,x,q9;
	q7,l,x,q9;
	q7,w,x,q9;
#  post-s VoicessCons + j
	q2,p,x,q3;
	q2,t,x,q3;
	q2,k,x,q3;
#  Consonant + j + u onset
	q1,p,x,q3;
	q1,b,x,q3;
	q1,t,x,q3;
	q1,d,x,q3;
	q1,k,x,q3;
	q1,g,x,q3;
	q1,f,x,q3;
	q1,v,x,q3;
	q1,T,x,q3;
	q1,h,x,q3;
	q1,m,x,q3;
	q1,n,x,q3;
# j
	q3,j,x,q8;
	q8,u:,x,q11;
# Nucleus
#  Long/tense vowel
	q9,i:,x,q11;
	q9,eI,x,q11;
	q9,aI,x,q11;
	q9,OI,x,q11;
	q9,u:,x,q11;
	q9,@U,x,q11;
	q9,aU,x,q11;
	q9,3:,x,q11;
	q9,A:,x,q11;
	q9,O:,x,q11;
# Centring schwa diphthongs
	q9,I@,x,q12;
	q9,e@,x,q12;
	q9,U@,x,q17;
# Short vowel + coda
	q9,I,x,q10;
	q9,E,x,q10;
	q9,{,x,q10;
#	q9,@,x,q10;
	q9,Q,x,q10;
	q9,V,x,q10;
	q9,U,x,q10;
# Coda
#  post long vowel coda
	q11,p,x,q17;
	q11,b,x,q17;
	q11,t,x,q17;
	q11,d,x,q17;
	q11,k,x,q17;
	q11,g,x,q17;
	q11,tS,x,q17;
	q11,dZ,x,q17;
	q11,f,x,q17;
	q11,v,x,q17;
	q11,T,x,q17;
	q11,D,x,q17;
	q11,s,x,q17;
	q11,z,x,q17;
	q11,S,x,q17;
	q11,Z,x,q17;
	q11,m,x,q17;
	q11,n,x,q17;
	q11,l,x,q17;
	q11,l,x,q11c;
# beast, roost, moist, ...
	q11,s,x,q11a;
# warmth
	q11,m,x,q11b;
# (beast, roost, moist, ...)
	q11a,t,x,q17;
	q11b,T,x,q17;
# (wield)
	q11c,d,x,q17;
# beard-weird
	q12,d,x,q17;
	q12,n,x,q17;
# Simple obstruent coda
	q10,p,x,q17;
	q10,b,x,q17;
	q10,t,x,q17;
	q10,d,x,q17;
	q10,k,x,q17;
	q10,g,x,q17;
	q10,tS,x,q17;
	q10,dZ,x,q17;
	q10,f,x,q17;
	q10,v,x,q17;
	q10,T,x,q17;
	q10,D,x,q17;
	q10,s,x,q17;
	q10,z,x,q17;
	q10,S,x,q17;
	q10,Z,x,q17;
# Restricted obstruent cluster codas
# adze, badge
	q10,d,x,q10d;
# clasp, fast, ask
	q10,s,x,q10a;
# apt, act; apse, axe
	q10,p,x,q10b;
	q10,k,x,q10b;
# raft
	q10,f,x,q10c;
# (clasp, fast, ask)
	q10a,p,x,q17;
	q10a,t,x,q17;
	q10a,k,x,q17;
# (apt,act; apse, axe)
	q10b,t,x,q17;
	q10b,s,x,q17;
# (raft)
	q10c,t,x,q17;
# adze; batch, badge
	q10d,T,x,q17;
	q10d,z,x,q17;
	q10d,S,x,q17;
	q10d,Z,x,q17;
# Sonorant + obstruent codas
# - sonorant
	q10,l,x,q13;
	q10,n,x,q14;
	q10,m,x,q15;
	q10,N,x,q16;
# post-l 
	q13,p,x,q17;
	q13,t,x,q17;
	q13,d,x,q17;
	q13,k,x,q17;
	q13,tS,x,q17;
	q13,dZ,x,q17;
	q13,f,x,q17;
	q13,v,x,q17;
	q13,T,x,q17;
	q13,s,x,q17;
	q13,S,x,q17;
# post-n, lint, wind, plinth,...
	q14,t,x,q17;
	q14,d,x,q17;
	q14,T,x,q17;
	q14,s,x,q17;
	q14,z,x,q17;
	q14,tS,x,q17;
	q14,dZ,x,q17;
# post-m
	q15,p,x,q17;
#  bumf (triumph)
	q15,f,x,q17;
# post-N
	q16,k,x,q17;
	q16,T,x,q17

Dafydd Gibbon, 2012-09-13