The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing. Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup. Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA im...
Tools assisting professional translators memorise translated sentences but provide limited functiona...
International audienceA bitext is a merged document composed of two versions of a given text, usuall...
International audienceRecent work has demonstrated the importance of dealing with Multi-Word Terms (...
As empirical methods have come to the fore in multilingual language technology and translation studi...
This paper focuses on investigation of the parallel corpora role as a linguistic recourse. The appli...
There has recently been an increasing awareness of the importance of large collections of texts (cor...
In this paper we first give an overview of parallel corpus annotation, alignment and retrieval. We p...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
In this work, an extensible word-alignment framework is implemented from scratch. It is based on a d...
This thesis addresses two closely related problems. The first, translation alignment, consists of id...
For the purpose of this descriptive translation study, a translation corpus was built from roughly t...
In recent years statistical word alignmentmodels have been widely used for various NaturalLanguage P...
Bilingual lexicons of multiword expressions play a vital role in several natural language processing...
Sentence alignment represents the basis for computer-assisted translation (CAT), terminology managem...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Tools assisting professional translators memorise translated sentences but provide limited functiona...
International audienceA bitext is a merged document composed of two versions of a given text, usuall...
International audienceRecent work has demonstrated the importance of dealing with Multi-Word Terms (...
As empirical methods have come to the fore in multilingual language technology and translation studi...
This paper focuses on investigation of the parallel corpora role as a linguistic recourse. The appli...
There has recently been an increasing awareness of the importance of large collections of texts (cor...
In this paper we first give an overview of parallel corpus annotation, alignment and retrieval. We p...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
In this work, an extensible word-alignment framework is implemented from scratch. It is based on a d...
This thesis addresses two closely related problems. The first, translation alignment, consists of id...
For the purpose of this descriptive translation study, a translation corpus was built from roughly t...
In recent years statistical word alignmentmodels have been widely used for various NaturalLanguage P...
Bilingual lexicons of multiword expressions play a vital role in several natural language processing...
Sentence alignment represents the basis for computer-assisted translation (CAT), terminology managem...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Tools assisting professional translators memorise translated sentences but provide limited functiona...
International audienceA bitext is a merged document composed of two versions of a given text, usuall...
International audienceRecent work has demonstrated the importance of dealing with Multi-Word Terms (...