The paper discusses several key concepts related to the development of corpora and reconsiders them in light of recent developments in NLP. On the basis of an overview of present-day corpora, we conclude that the dominant practices of corpus design do not utilise the technologies adequately and, as a result, fail to meet the demands of corpus linguistics, computational lexicology and computational linguistics alike. We proceed to lay out a data-driven approach to corpus design, which integrates the best practices of traditional corpus linguistics with the potential of the latest technologies allowing fast collection, automatic metadata description and annotation of large amounts of data. Thus, the gist of the approach we propose is that cor...
The paper focuses on the construction of valence lexicons for Bulgarian. It presents two approaches:...
The Bulgarian-Polish-Russian parallel corpus The Semantics Laboratory Team of Institute of Slavic S...
The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian para...
The paper presents Bulgarian National Corpus project (BulNC)- a large-scale, representative, online ...
The paper presents the methodology and the outcome of the compilation and the processing of the Bulg...
Multilingual digital resources with Bulgarian languageThe paper presents in brief Bulgarian language...
HPSG-based annotation including: constituent structure, dependency relations, named entities (classi...
This paper presents a linguistic processing pipeline for Bulgarian including morphological analysis,...
Contemporary information technologies and mathematical modelling has made creating corpora of natura...
In this paper we report on the progress in the creation of an Ontology-based lexicon for Bulgarian. ...
In the last decade, corpus linguistics has finally established itself as a separate research startin...
Bulgarian sense-annotated corpus – between the tradition and novelty The Bulgarian Sense-annotated ...
The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus....
The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, ...
In this paper we present the process of designing an efficient speech corpus for the first unit sele...
The paper focuses on the construction of valence lexicons for Bulgarian. It presents two approaches:...
The Bulgarian-Polish-Russian parallel corpus The Semantics Laboratory Team of Institute of Slavic S...
The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian para...
The paper presents Bulgarian National Corpus project (BulNC)- a large-scale, representative, online ...
The paper presents the methodology and the outcome of the compilation and the processing of the Bulg...
Multilingual digital resources with Bulgarian languageThe paper presents in brief Bulgarian language...
HPSG-based annotation including: constituent structure, dependency relations, named entities (classi...
This paper presents a linguistic processing pipeline for Bulgarian including morphological analysis,...
Contemporary information technologies and mathematical modelling has made creating corpora of natura...
In this paper we report on the progress in the creation of an Ontology-based lexicon for Bulgarian. ...
In the last decade, corpus linguistics has finally established itself as a separate research startin...
Bulgarian sense-annotated corpus – between the tradition and novelty The Bulgarian Sense-annotated ...
The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus....
The paper presents the quite long-standing tradition of Romanian corpus acquisition and processing, ...
In this paper we present the process of designing an efficient speech corpus for the first unit sele...
The paper focuses on the construction of valence lexicons for Bulgarian. It presents two approaches:...
The Bulgarian-Polish-Russian parallel corpus The Semantics Laboratory Team of Institute of Slavic S...
The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian para...