A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
Content: List of the 743 domains, their term vocabularies in 10 languages, and the Wikipedia articl...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected ...
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to pl...
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to pl...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words fo...
This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words fo...
For a variety of reason, getting started in corpus linguistics is difficult. The starting point for ...
For a variety of reason, getting started in corpus linguistics is difficult. The starting point for ...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
This archive contains a collection of language corpora. These are text files that contain samples of...
This archive contains a collection of language corpora. These are text files that contain samples of...
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
Content: List of the 743 domains, their term vocabularies in 10 languages, and the Wikipedia articl...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected ...
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to pl...
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to pl...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words fo...
This thesis introduces the W2C Corpus which contains 97 languages with more than 10 million words fo...
For a variety of reason, getting started in corpus linguistics is difficult. The starting point for ...
For a variety of reason, getting started in corpus linguistics is difficult. The starting point for ...
Wikipedia Corpus is a bilingual—Spanish-English—single-label corpus composed of 3,019 documents abou...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
This archive contains a collection of language corpora. These are text files that contain samples of...
This archive contains a collection of language corpora. These are text files that contain samples of...
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
Content: List of the 743 domains, their term vocabularies in 10 languages, and the Wikipedia articl...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...