Next: Sampling Up: Statistical and experimental procedures Previous: Statistical analysis

## Populations, samples and other terminology

A population is the collection of all objects that are of interest for the task in hand. In the earlier example, all inhabitants of the country are the population. Here everyday use of the term `population' corresponds with its use in a statistical sense. Though population in a statistical sense can have the same meaning as the geographical sense, it need not be the case. Thus, for instance the population of users of a speaker verification  system of a bank would only comprise the clients of the bank. Population does not only refer to humans - for example, the population of /p/ phonemes  of a speaker would be all of the instances of that phoneme a speaker ever produces.

A variable  ranges over numerical values associated with each unit of the population. Variables are classed as either independent  or dependent variables . An independent variable   is one that is controlled or manipulated by the experimenter. So, for example, when setting up a corpus, the experimenter might consider it necessary to ensure that as many females are recorded in the test data  as males. Sex  would then be an independent variable (independent variables  are also referred to as factors , particularly in connection with the statistical technique Analysis of Variance, ANOVA discussed in Section 9.2.6). A dependent variable  is a variable that the investigator measures to determine the effect of the independent variable . Thus you might need to ascertain whether recognition accuracy  (dependent variable)   is affected by the sex  of the speakers (independent variable ).

When a variable  is measured on all units of a population, a full census has been taken. If it were always possible to obtain census data, there would be no need for statistics. However, since most language engineering applications (and, indeed, in many other aspects that require measurement), involve very large or infinite populations (such as those illustrated earlier of speakers or phonemes ), it is not possible to measure variables  on all units: In these circumstances, a finite sample is taken. This sample is used to study the variable  of concern in the population. So, if you wanted an idea of the average voice fundamental frequency   of men, you might make measurements on a sample of 100 men. This sample is then studied as if it is representative of the population. The statistician is able to provide information about the relationship between variables  measured on the sample (here its mean) and, what the investigator is really interested in, the mean voice fundamental frequency  of the population.

Next: Sampling Up: Statistical and experimental procedures Previous: Statistical analysis

EAGLES SWLG SoftEdition, May 1997. Get the book...