The paper gives an introduction to the design and composition of general language corpora, and the problem of statistical representativeness is considered. In order to determine, check and document their composition, a classificatory scheme for text types is needed. The building of a corpus is seen as an iterative process: The original design may set up overall quantitative measures for a few easily distinguishable and rather general text types; after the collection of some texts or text samples, which should be annotated according to a more fine-grained classification scheme, the distribution by more specific classes (e.g. topic under a general class of non-fiction) is measured, and the design criteria are adjusted accordingly. The design ...