Parallel study of three very different languages- Hungarian. German and English- using text corpora of a similar size gives a possibility for the exploration of both similarities and differences. Corpora of publicly available Internet sources was used. The corpus size was the same (app. 20Mbytes, 2.5-3.5 million word forms) for all languages. Besides traditional corpus coverage, word length and occurence statistics, some new features about prosodic boundaries (sentence beginning and final positions, preceding and following a comma) were also computed. Among others, it was found, that the coverage of corpora by the most frequent words follows a parallel logarithmic rule for all languages in the 40-85 % coverage range. The functions are much ...
Abstract. In this article, we are studying the differences between the European languages using stat...
Europarl is a large multilingual corpus containing the minutes of the debates at the European Parlia...
This paper will have a holistic view at the field of corpus-based linguistic typology and present an...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
This volume sets out to give a voice to a range of less frequently studied European languages from P...
This paper reports on the efforts of twelve national teams in building the International Comparable ...
This paper reports on the efforts of twelve national teams in building the International Comparable ...
This dissertation centers around the question whether syntactic differences between languages can be...
Since 2011 the comprehensive, electronically available sources of the Leipzig Corpora Collection hav...
International audienceHow to find words with the same meaning in different language registers? After...
<p>Equating corpus sizes (left) resulted in average word frequencies that were comparable across lan...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
Samlowski B, Möbius B, Wagner P. Comparing syllable frequencies in corpora of written and spoken lan...
One of the many practical applications of corpus studies is the generation of word frequency informa...
The present paper aims to show how a cross-linguistic analysis based on a parallel corpus can be use...
Abstract. In this article, we are studying the differences between the European languages using stat...
Europarl is a large multilingual corpus containing the minutes of the debates at the European Parlia...
This paper will have a holistic view at the field of corpus-based linguistic typology and present an...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
This volume sets out to give a voice to a range of less frequently studied European languages from P...
This paper reports on the efforts of twelve national teams in building the International Comparable ...
This paper reports on the efforts of twelve national teams in building the International Comparable ...
This dissertation centers around the question whether syntactic differences between languages can be...
Since 2011 the comprehensive, electronically available sources of the Leipzig Corpora Collection hav...
International audienceHow to find words with the same meaning in different language registers? After...
<p>Equating corpus sizes (left) resulted in average word frequencies that were comparable across lan...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
Samlowski B, Möbius B, Wagner P. Comparing syllable frequencies in corpora of written and spoken lan...
One of the many practical applications of corpus studies is the generation of word frequency informa...
The present paper aims to show how a cross-linguistic analysis based on a parallel corpus can be use...
Abstract. In this article, we are studying the differences between the European languages using stat...
Europarl is a large multilingual corpus containing the minutes of the debates at the European Parlia...
This paper will have a holistic view at the field of corpus-based linguistic typology and present an...