Populations, samples and other terminology

Next: Sampling Up: Statistical and experimental procedures Previous: Statistical analysis

Populations, samples and other terminology

A population is the collection of all objects that are of interest for the task in hand. In the earlier example, all inhabitants of the country are the population. Here everyday use of the term `population' corresponds with its use in a statistical sense. Though population in a statistical sense can have the same meaning as the geographical sense, it need not be the case. Thus, for instance the population of users of a speaker verification system of a bank would only comprise the clients of the bank. Population does not only refer to humans - for example, the population of /p/ phonemes of a speaker would be all of the instances of that phoneme a speaker ever produces.

A variable ranges over numerical values associated with each unit of the population. Variables are classed as either independent or dependent variables . An independent variable is one that is controlled or manipulated by the experimenter. So, for example, when setting up a corpus, the experimenter might consider it necessary to ensure that as many females are recorded in the test data as males. Sex would then be an independent variable (independent variables are also referred to as factors , particularly in connection with the statistical technique Analysis of Variance, ANOVA discussed in Section 9.2.6). A dependent variable is a variable that the investigator measures to determine the effect of the independent variable . Thus you might need to ascertain whether recognition accuracy (dependent variable) is affected by the sex of the speakers (independent variable ).

When a variable is measured on all units of a population, a full census has been taken. If it were always possible to obtain census data, there would be no need for statistics. However, since most language engineering applications (and, indeed, in many other aspects that require measurement), involve very large or infinite populations (such as those illustrated earlier of speakers or phonemes ), it is not possible to measure variables on all units: In these circumstances, a finite sample is taken. This sample is used to study the variable of concern in the population. So, if you wanted an idea of the average voice fundamental frequency of men, you might make measurements on a sample of 100 men. This sample is then studied as if it is representative of the population. The statistician is able to provide information about the relationship between variables measured on the sample (here its mean) and, what the investigator is really interested in, the mean voice fundamental frequency of the population.

Next: Sampling Up: Statistical and experimental procedures Previous: Statistical analysis

EAGLES SWLG SoftEdition, May 1997. Get the book...