Transcription involves many issues of principle over which phoneticians and linguists have debated for decades. These issues may be new, though, to many engineers and speech technologists. Among such issues are (i) whether the notation should be phonemic, or to some extent allophonic; and if phonemic, how the phoneme set is to be established; (ii) to what extent phonetic symbols should be required to have the same meaning across different languages; and (iii), the relation between the basic, lexical, pronunciation of a word and its actual pronunciation in context.
In principle, SAMPA provides for phonemic notation of languages. For example, the r-sounds of English rip, trip, and drip are all instances of the phoneme /r/, although different articulatory and acoustically (in voicing and in presence/absence of friction). These different allophones are predictable from the phonetic context: we can unambiguously write them all as /r/. The arguments for preferring phonemic notation to allophonic are (i) it is simpler while still being unambiguous; (ii) correct identification of allophones may be difficult for those without phonetic training; and (iii) too few codes are available in the range 32-127 to provide for all allophones.
In syllable-initial position, English /t/ is alveolar and aspirated; French /t/, dental and unaspirated; Swedish /t/, dental and aspirated. We ignore these comparative differences in our notation, writing all as /t/. SAMPA does not need to adopt distinct symbols to reflect these differences. (However, if and when SAMPA is applied to Hindi, for example, where these differences are phonemic, it would become necessary to notate them explicitly.)
In continuous speech the actual sounds used in pronouncing a word may well differ from the word's citation form (dictionary entry). A phonotypical transcription is one in which citation forms are modified in accordance with known phonetic rules of connected speech. For example, in a phonotypical transcription of English, final linking /r/ would be shown before a following vowel (better ask) but not before a consonant (better go); the lexical entry would be invariant. In an actual utterance the speaker might or might not conform to phonotypical expectations; an impressionistic transcription reflects a human (or mechanical) auditory or acoustic analysis of what was actually said. In the case at issue, /r/ would be shown if phonetically present in a given instance, not otherwise.
In practice, colleagues working on the various languages to which SAMPA has been applied have chosen to deviate in various respects from these principles. English has plosive /d/ and fricative // (SAMPA /D/) as distinct phonemes (den, then). In Spanish, they are undoubtedly allophones of the same phoneme, and could unambiguously both be written /d/; but for speech technology work our Spanish colleagues prefer to notate them distinctively, as ``d'' and ``D'' respectively. The r-sounds in French rouge, lettre are different from all the English r-sounds, being respectively a voiced and voiceless uvular fricative. It would seem unambiguous and logical to write them, too, as /r/. But our French colleagues have preferred to use the distinct uvular-r symbol, also provided in SAMPA, namely /R/.
Nevertheless I believe we should as far as possible discourage allophonic and comparative notation. Bulgarian has the simple 6-vowel system, IPA /i e a o u /. A colleague in Bulgaria has proposed that they be represented in SAMPA as /I, E, a, O, U, @/. About /a/ and /@/ (= IPA //) we can agree. But the other symbols he proposes are inappropriately comparative. The Bulgarian vowels should appear in SAMPA as /i, e, a, o, u, @/.