Abstract:- In this paper, we use a lexical method to do sentence alignment for an English-Chinese corpus. Past research shows that alignment using a dictionary involves a lot of word matching and dictionary look ups. To address these two issues, we first restrict the range of candidate target sentences, based on the location of the source sentence relative to the beginning of the text. Moreover, careful empirical selection of stop words, based on word frequencies in the source text, helps to reduce the number of dictionary look ups. Experimental results show that the amount of word matching can be cut down by 75 % and that of dictionary look ups by as much as 43 % without sacrificing precision and recall. Another experiment was also done wi...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
Most current sentence alignment approaches adopt sentence length and cognate as the alignment featur...
Languages that have no explicit word de-limiters often have to be segmented for sta-tistical machine...
[[abstract]]In this paper, we use a lexical method to do sentence alignment for an English-Chinese c...
Sentence alignment is most important for Chinese-English bilingual corpus alignment. This paper anal...
One of the bilingual corpus processing methods is the alignment of two languages on each linguistic ...
We describe our experience with automatic alignment of sentences inparallel English-Chinese texts. ...
Bilingual alignment is a crucial problem in the research of natural language processing, and word al...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then apply...
Most of the current Chinese word alignment tasks often adopt word segmentation systems firstly to id...
In this paper, a method for the word alignment of English-Chinese corpus based on chunks is proposed...
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natura...
We introduce a word alignment framework that facilitates the incorporation of syntax en-coded in bil...
This paper presents an algorithm capable of identifying the translation for each word in a bilingual...
[[abstract]]©1997 MIT-This paper presents an algorithm capable of identifying the translation for ea...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
Most current sentence alignment approaches adopt sentence length and cognate as the alignment featur...
Languages that have no explicit word de-limiters often have to be segmented for sta-tistical machine...
[[abstract]]In this paper, we use a lexical method to do sentence alignment for an English-Chinese c...
Sentence alignment is most important for Chinese-English bilingual corpus alignment. This paper anal...
One of the bilingual corpus processing methods is the alignment of two languages on each linguistic ...
We describe our experience with automatic alignment of sentences inparallel English-Chinese texts. ...
Bilingual alignment is a crucial problem in the research of natural language processing, and word al...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then apply...
Most of the current Chinese word alignment tasks often adopt word segmentation systems firstly to id...
In this paper, a method for the word alignment of English-Chinese corpus based on chunks is proposed...
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natura...
We introduce a word alignment framework that facilitates the incorporation of syntax en-coded in bil...
This paper presents an algorithm capable of identifying the translation for each word in a bilingual...
[[abstract]]©1997 MIT-This paper presents an algorithm capable of identifying the translation for ea...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
Most current sentence alignment approaches adopt sentence length and cognate as the alignment featur...
Languages that have no explicit word de-limiters often have to be segmented for sta-tistical machine...