With more and more text being available in electronic form, it is becoming relatively easy to obtain digital texts together with their translations. The paper presents the processing steps necessary to compile such texts into parallel corpora, an extremely useful language resource. Parallel corpora can be used as a translation aid for second-language learners, for translators and lexicographers, or as a data-source for various language technology tools. We present our work in this direction, which is characterised by the use of open standards for text annotation, the use of publicly available third-party tools and wide availability of the produced resources. Explained is the corpus annotation chain involving normalisation, tokenisation, seg...
Large, uniformly encoded collections of texts, corpora, are an invaluable source of data, not only f...
Includes bibliographical references (page 4-5).We describe some of the challenges in developing Engl...
Este artículo describe NATools, un conjunto de herramientas de procesamiento, análisis y extracción...
With more and more text being available in electronic form, it is becoming relatively easy to obtain...
Exchange between the translation studies and the computational linguistics communities has tradition...
This paper focuses on investigation of the parallel corpora role as a linguistic recourse. The appli...
Exchange between the translation studies and the computational linguistics communities has tradition...
This chapter gives an overview of parallel corpora, i.e. corpora containing source texts in a given ...
This paper focuses on the description of the corpus «PEST-INTER» in five languages and the process o...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
There has recently been an increasing awareness of the importance of large collections of texts (cor...
In this paper we first give an overview of parallel corpus annotation, alignment and retrieval. We p...
In this article we illustrate and evaluate an approach to the creation of high quality linguisticall...
Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortu...
This paper discusses the role played by parallel corpora in the design and implementation of fully a...
Large, uniformly encoded collections of texts, corpora, are an invaluable source of data, not only f...
Includes bibliographical references (page 4-5).We describe some of the challenges in developing Engl...
Este artículo describe NATools, un conjunto de herramientas de procesamiento, análisis y extracción...
With more and more text being available in electronic form, it is becoming relatively easy to obtain...
Exchange between the translation studies and the computational linguistics communities has tradition...
This paper focuses on investigation of the parallel corpora role as a linguistic recourse. The appli...
Exchange between the translation studies and the computational linguistics communities has tradition...
This chapter gives an overview of parallel corpora, i.e. corpora containing source texts in a given ...
This paper focuses on the description of the corpus «PEST-INTER» in five languages and the process o...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
There has recently been an increasing awareness of the importance of large collections of texts (cor...
In this paper we first give an overview of parallel corpus annotation, alignment and retrieval. We p...
In this article we illustrate and evaluate an approach to the creation of high quality linguisticall...
Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortu...
This paper discusses the role played by parallel corpora in the design and implementation of fully a...
Large, uniformly encoded collections of texts, corpora, are an invaluable source of data, not only f...
Includes bibliographical references (page 4-5).We describe some of the challenges in developing Engl...
Este artículo describe NATools, un conjunto de herramientas de procesamiento, análisis y extracción...