With few exceptions, the texts in NL corpora have previously been published. From a legal point of view,
this implies that any use of electronic copies should adhere to copyright
rules and regulations. In most
countries copyright laws were passed long before the era of electronic publishing. However, laws designed
to protect printed materials may not be optimal for the protection of machine readable text. Neither is it
obvious how abuse of electronic texts can be detected and prevented. These problems have impeded the
distribution of NL corpora quite considerably and it would be optimistic to suggest that all problems are
close to a solution.
For SL corpora the legal issues are even less well understood. Has a speaker who is recorded while
reading sentences presented by an experimenter any legal rights with respect to the sounds produced?
Recordings of spontaneous speech are even more complex in this respect, since a speaker might claim
rights as to the contexts and details of the formulations used. If speakers are recruited to contribute to a
SL corpus, legal problems can be avoided by requesting them to sign a consent form. Building corpora
from existing recordings (e.g. from radio and television broadcasts) is more difficult in this respect,
because it may not always be feasible to contact all relevant speakers. Under the law of EU countries
unauthorised re-broadcast of recordings made from radio or television is illegal. It is less clear what the
legal status is of limited redistribution of recordings for research and development in speech science and
technology. For more information on this topic, we refer to Section 4.3.4.