Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extract such translations from a large collec-tion of web documents by building a partially parallel corpus and use a word alignment al-gorithm to identify the terms being translated. The method is able to generalize across the translations for different terms and can relia-bly extract translations that occurred only once in the entire web. Our experiment on Chinese web pages produced more than 26 million pairs of translations, which is over two orders of magnitude more than previous re-sults. We show that the addition of the ex-tracted translation pairs as training data...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natura...
This paper describes a method for searching word correspondences between pairs of translation sen-te...
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their tran...
This paper describes a system that automatically mines English-Chinese translation pairs from large ...
Mining translations from abundant Web data can be applied in many fields such as computer assisted l...
In recent years, state-of-the-art cross-linguistic systems have been based on parallel corpora. Neve...
Parallel corpora are a crucial resource in research fields such as cross-lingual infor-mation retrie...
New words such as names, technical terms, etc appear frequently. As such, the bilingual lexicon of a...
In this paper, we propose a novel system for translating organization names from Chinese to English ...
Parallel corpora are critical resources for machine translation research and development since paral...
[[abstract]]We introduce a method for learning to find domain-specific translations for a given term...
We present a method for learning to find English to Chinese transliterations on the Web. In our appr...
Mining bilingual data (including bilingual sen-tences and terms 1) from the Web can benefit many NLP...
We report experimental results on automatic extraction of an English-Chinese translation lexicon, by...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natura...
This paper describes a method for searching word correspondences between pairs of translation sen-te...
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their tran...
This paper describes a system that automatically mines English-Chinese translation pairs from large ...
Mining translations from abundant Web data can be applied in many fields such as computer assisted l...
In recent years, state-of-the-art cross-linguistic systems have been based on parallel corpora. Neve...
Parallel corpora are a crucial resource in research fields such as cross-lingual infor-mation retrie...
New words such as names, technical terms, etc appear frequently. As such, the bilingual lexicon of a...
In this paper, we propose a novel system for translating organization names from Chinese to English ...
Parallel corpora are critical resources for machine translation research and development since paral...
[[abstract]]We introduce a method for learning to find domain-specific translations for a given term...
We present a method for learning to find English to Chinese transliterations on the Web. In our appr...
Mining bilingual data (including bilingual sen-tences and terms 1) from the Web can benefit many NLP...
We report experimental results on automatic extraction of an English-Chinese translation lexicon, by...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natura...
This paper describes a method for searching word correspondences between pairs of translation sen-te...