International audienceFollowing the pioneering work by \cite{Li-Gaussier-10}, we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by \cite{Li-Gaussier-10}, we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtain...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
AbstractParallel corpora are not available for all domains and languages, but statistical methods in...
In the presence of bilingual comparable corpora it is natural to embed the data in two distinct ling...
International audienceFollowing the pioneering work by \cite{Li-Gaussier-10}, we address in this pap...
International audienceWe study in this chapter the problem of measuring the degree of comparability ...
Thematic comparable corpora regroup texts from a same topic and written in several languages, highly...
Les corpus bilingues sont des ressources essentielles pour s'affranchir de la barrière de la langue ...
International audienceThis work is motivated by the will of creating a new part-of-speech annotated ...
When multilingual corpora are used in translation studies, it is usually assumed that they are eithe...
In this paper we present a metric that measures comparability of documents across different language...
Bilingual corpora are an essential resource used to cross the language barrier in multilingual Natur...
Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Sele...
In this paper, we present a model of statistical word-level mapping for comparable corpora. The appr...
This thesis examines the possibility of using comparable corpora to augment statistical models of tr...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
AbstractParallel corpora are not available for all domains and languages, but statistical methods in...
In the presence of bilingual comparable corpora it is natural to embed the data in two distinct ling...
International audienceFollowing the pioneering work by \cite{Li-Gaussier-10}, we address in this pap...
International audienceWe study in this chapter the problem of measuring the degree of comparability ...
Thematic comparable corpora regroup texts from a same topic and written in several languages, highly...
Les corpus bilingues sont des ressources essentielles pour s'affranchir de la barrière de la langue ...
International audienceThis work is motivated by the will of creating a new part-of-speech annotated ...
When multilingual corpora are used in translation studies, it is usually assumed that they are eithe...
In this paper we present a metric that measures comparability of documents across different language...
Bilingual corpora are an essential resource used to cross the language barrier in multilingual Natur...
Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Sele...
In this paper, we present a model of statistical word-level mapping for comparable corpora. The appr...
This thesis examines the possibility of using comparable corpora to augment statistical models of tr...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
AbstractParallel corpora are not available for all domains and languages, but statistical methods in...
In the presence of bilingual comparable corpora it is natural to embed the data in two distinct ling...