A *population* is the collection of all objects
that are of interest for the task in hand. In the earlier example,
all inhabitants of the country are the population.
Here everyday use of the term
`population' corresponds with its use in a statistical sense.
Though population in a statistical sense can have the same meaning
as the
geographical sense, it need not be the case. Thus, for
instance the population of users of a speaker verification system
of a bank would only comprise the clients of the bank. Population
does not only refer to humans - for example, the population of
/p/ phonemes of a speaker would be all of the
instances of that phoneme a speaker ever produces.

A *variable* ranges over numerical values
associated with each unit of the population.
Variables are classed as either *
independent* or
*dependent variables* . An independent variable
is one that is
controlled or manipulated by the experimenter. So, for example,
when setting up a corpus, the experimenter might consider it necessary
to ensure that as many females are recorded in the test data as
males. Sex would then be an independent variable (independent
variables are also referred to as *
factors* ,
particularly in connection with the statistical technique *Analysis of
Variance, ANOVA* discussed in Section 9.2.6).
A dependent variable is
a variable that the investigator measures to determine the effect of the
independent variable . Thus you might need to
ascertain whether recognition accuracy (dependent variable)
is affected by the sex of the speakers (independent
variable ).

When a variable is measured on all units of a population, a full census has been taken. If it were always possible to obtain census data, there would be no need for statistics. However, since most language engineering applications (and, indeed, in many other aspects that require measurement), involve very large or infinite populations (such as those illustrated earlier of speakers or phonemes ), it is not possible to measure variables on all units: In these circumstances, a finite sample is taken. This sample is used to study the variable of concern in the population. So, if you wanted an idea of the average voice fundamental frequency of men, you might make measurements on a sample of 100 men. This sample is then studied as if it is representative of the population. The statistician is able to provide information about the relationship between variables measured on the sample (here its mean) and, what the investigator is really interested in, the mean voice fundamental frequency of the population.