In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on c2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to nontranslated ones, due to a universal tendency for explicitation
Statements like ‘Word X of language A is translated with word Y of language B’ are incorrect, althou...
This research addresses the problem of deriving semantic similarity between words of language using ...
International audienceFollowing the pioneering work by \cite{Li-Gaussier-10}, we address in this pap...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
none1noFor almost two decades now, mainstream corpus-based research in descriptive translation studi...
Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus l...
The recent emergence of large parallel corpora has represented a leap ahead for cross-linguistic and...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
When carrying out a cross-linguistic study, the first step is “to make sure that you are comparing l...
One of the most innovative strands of (corpus-based) research in translation studies in the last few...
Statements like ‘Word X of language A is translated with word Y of language B’ are incorrect, althou...
This research addresses the problem of deriving semantic similarity between words of language using ...
International audienceFollowing the pioneering work by \cite{Li-Gaussier-10}, we address in this pap...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
none1noFor almost two decades now, mainstream corpus-based research in descriptive translation studi...
Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus l...
The recent emergence of large parallel corpora has represented a leap ahead for cross-linguistic and...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
When carrying out a cross-linguistic study, the first step is “to make sure that you are comparing l...
One of the most innovative strands of (corpus-based) research in translation studies in the last few...
Statements like ‘Word X of language A is translated with word Y of language B’ are incorrect, althou...
This research addresses the problem of deriving semantic similarity between words of language using ...
International audienceFollowing the pioneering work by \cite{Li-Gaussier-10}, we address in this pap...