This paper explores corpus-based bilingual re-trieval where the translation corpora used vary by source and size. We find that the quality of translation alignments and the domain of the bitext are important. In some settings these factors are more critical than corpus size. We also show that judicious choice of tokeniza-tion can reduce the amount of bitext required to obtain good bilingual retrieval performance.
In this paper we describe a method for detecting terminological variants and their translations in b...
In this paper, we describe a bilingual corpus processing strategy, block analysis, from a new point ...
We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of st...
The traditional approach to information retrieval is based on using words as the indexing and search...
This paper introduces some key aspects of machine translation in order to situate the role of the bi...
In this article I discuss the role of translated texts in different types of corpora. I first consid...
Abstract. This paper introduces some key aspects of machine translation in order to situate the role...
This study investigates (and compares) the impact of the size and the similarity/quality of comparab...
This chapter describes translation-relevant types of corpora and the main ways in which they can be ...
Cross-lingual word embeddings are an increasingly important reseource in cross-lingual methods for N...
International audienceA bitext is a merged document composed of two versions of a given text, usuall...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
International audienceFor endangered languages, data collection campaigns have to accommodate the ch...
International audienceThe main work in bilingual lexicon extraction from comparable corpora is based...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
In this paper we describe a method for detecting terminological variants and their translations in b...
In this paper, we describe a bilingual corpus processing strategy, block analysis, from a new point ...
We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of st...
The traditional approach to information retrieval is based on using words as the indexing and search...
This paper introduces some key aspects of machine translation in order to situate the role of the bi...
In this article I discuss the role of translated texts in different types of corpora. I first consid...
Abstract. This paper introduces some key aspects of machine translation in order to situate the role...
This study investigates (and compares) the impact of the size and the similarity/quality of comparab...
This chapter describes translation-relevant types of corpora and the main ways in which they can be ...
Cross-lingual word embeddings are an increasingly important reseource in cross-lingual methods for N...
International audienceA bitext is a merged document composed of two versions of a given text, usuall...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
International audienceFor endangered languages, data collection campaigns have to accommodate the ch...
International audienceThe main work in bilingual lexicon extraction from comparable corpora is based...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
In this paper we describe a method for detecting terminological variants and their translations in b...
In this paper, we describe a bilingual corpus processing strategy, block analysis, from a new point ...
We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of st...