Corpus size has traditionally been measured in number of words. Working with a single (European) language, this is adequate in most circumstances. Problems arise when corpora are compared cross-linguistically: accurately measuring the amount of linguistic material represented in a corpus is pivotal for many analyses, including for such corpus linguistic staples as keyword analyses, frequency measurements of linguistic items and normalisation of frequencies. Yet words in different languages are very different things. Isolating languages like English use up a high number of words compared to synthetic and polysynthetic languages that use drastically fewer words to express similar messages (e.g. German Donaudampfsschifffahrtsgesellschaftskapit...
For almost two decades now, mainstream corpus-based research in descriptive translation studies has ...
With this article, we seek to support the law of growing standardization by showing that texts trans...
The quality of statistical measurements on corpora is strongly related to a strict definition of the...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
<p>Equating corpus sizes (left) resulted in average word frequencies that were comparable across lan...
Recently, textual characteristics, i.e. certain language statistics, have been proposed to compare c...
Parallel study of three very different languages- Hungarian. German and English- using text corpora ...
International audienceAs the quality and availability of corpora of lesser-documented languages grow...
When multilingual corpora are used in translation studies, it is usually assumed that they are eithe...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Over the past decade, rapid technological evolution has revolutionised the study of language; we hav...
In a recent article, Meylan and Griffiths (Meylan & Griffiths, 2021, henceforth, M&G) focus their at...
Corpus data have emerged as the raw data/benchmark for several NLP applications. Corpus is described...
This paper reports on the efforts of twelve national teams in building the International Comparable ...
This study investigates (and compares) the impact of the size and the similarity/quality of comparab...
For almost two decades now, mainstream corpus-based research in descriptive translation studies has ...
With this article, we seek to support the law of growing standardization by showing that texts trans...
The quality of statistical measurements on corpora is strongly related to a strict definition of the...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
<p>Equating corpus sizes (left) resulted in average word frequencies that were comparable across lan...
Recently, textual characteristics, i.e. certain language statistics, have been proposed to compare c...
Parallel study of three very different languages- Hungarian. German and English- using text corpora ...
International audienceAs the quality and availability of corpora of lesser-documented languages grow...
When multilingual corpora are used in translation studies, it is usually assumed that they are eithe...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Over the past decade, rapid technological evolution has revolutionised the study of language; we hav...
In a recent article, Meylan and Griffiths (Meylan & Griffiths, 2021, henceforth, M&G) focus their at...
Corpus data have emerged as the raw data/benchmark for several NLP applications. Corpus is described...
This paper reports on the efforts of twelve national teams in building the International Comparable ...
This study investigates (and compares) the impact of the size and the similarity/quality of comparab...
For almost two decades now, mainstream corpus-based research in descriptive translation studies has ...
With this article, we seek to support the law of growing standardization by showing that texts trans...
The quality of statistical measurements on corpora is strongly related to a strict definition of the...