The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling – general a...
Multilingual digital resources with Bulgarian languageThe paper presents in brief Bulgarian language...
The main goal of the project is to create the English-Polish-Belarusian Literary Parallel Corpus (EP...
With more and more text being available in electronic form, it is becoming relatively easy to obtain...
The paper presents Bulgarian National Corpus project (BulNC)- a large-scale, representative, online ...
The paper discusses several key concepts related to the development of corpora and reconsiders them ...
The Bulgarian-Polish-Russian parallel corpus The Semantics Laboratory Team of Institute of Slavic S...
The Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 was built by crawling the ".bg" and ".бг" int...
This paper focuses on the description of the corpus «PEST-INTER» in five languages and the process o...
The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus....
This paper presents the current status of the Latvian-Russian parallel corpus, which is an ongoing p...
The Slovene-English parallel corpus MaCoCu-sl-en 1.0 was built by crawling the ".si" internet top-le...
The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian para...
Large, uniformly encoded collections of texts, corpora, are an invaluable source of data, not only f...
We present a Swedish-Turkish parallel corpus and the automatic annotation procedure with tools that ...
This paper presents a linguistic processing pipeline for Bulgarian including morphological analysis,...
Multilingual digital resources with Bulgarian languageThe paper presents in brief Bulgarian language...
The main goal of the project is to create the English-Polish-Belarusian Literary Parallel Corpus (EP...
With more and more text being available in electronic form, it is becoming relatively easy to obtain...
The paper presents Bulgarian National Corpus project (BulNC)- a large-scale, representative, online ...
The paper discusses several key concepts related to the development of corpora and reconsiders them ...
The Bulgarian-Polish-Russian parallel corpus The Semantics Laboratory Team of Institute of Slavic S...
The Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 was built by crawling the ".bg" and ".бг" int...
This paper focuses on the description of the corpus «PEST-INTER» in five languages and the process o...
The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus....
This paper presents the current status of the Latvian-Russian parallel corpus, which is an ongoing p...
The Slovene-English parallel corpus MaCoCu-sl-en 1.0 was built by crawling the ".si" internet top-le...
The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian para...
Large, uniformly encoded collections of texts, corpora, are an invaluable source of data, not only f...
We present a Swedish-Turkish parallel corpus and the automatic annotation procedure with tools that ...
This paper presents a linguistic processing pipeline for Bulgarian including morphological analysis,...
Multilingual digital resources with Bulgarian languageThe paper presents in brief Bulgarian language...
The main goal of the project is to create the English-Polish-Belarusian Literary Parallel Corpus (EP...
With more and more text being available in electronic form, it is becoming relatively easy to obtain...