In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
This paper aims at investigating the use of textual distributional similarity measures in the contex...
Lexical co-occurrence counts from large corpora have been used to construct highdimensional vector-s...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus l...
none1noFor almost two decades now, mainstream corpus-based research in descriptive translation studi...
This research addresses the problem of deriving semantic similarity between words of language using ...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel...
Corpus linguistics lacks strategies for describing and compar-ing corpora. Currently most descriptio...
The recent emergence of large parallel corpora has represented a leap ahead for cross-linguistic and...
Decisions at the outset of compiling a comparable corpus are of crucial importance for how the corpu...
The primary goal of the present study is to find an adequate methodfor the quantitative analysis of ...
International audienceWe measure the number of true proportional analogies between chunks in two typ...
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
This paper aims at investigating the use of textual distributional similarity measures in the contex...
Lexical co-occurrence counts from large corpora have been used to construct highdimensional vector-s...
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity be...
Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus l...
none1noFor almost two decades now, mainstream corpus-based research in descriptive translation studi...
This research addresses the problem of deriving semantic similarity between words of language using ...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
Quantifying the similarity or dissimilarity between documents is an important task in authorship att...
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel...
Corpus linguistics lacks strategies for describing and compar-ing corpora. Currently most descriptio...
The recent emergence of large parallel corpora has represented a leap ahead for cross-linguistic and...
Decisions at the outset of compiling a comparable corpus are of crucial importance for how the corpu...
The primary goal of the present study is to find an adequate methodfor the quantitative analysis of ...
International audienceWe measure the number of true proportional analogies between chunks in two typ...
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
This paper aims at investigating the use of textual distributional similarity measures in the contex...
Lexical co-occurrence counts from large corpora have been used to construct highdimensional vector-s...