Tokenization is considered a solved problem when reduced to just word borders identification, punctuation and white spaces handling. Obtaining a high quality outcome from this process is essential for subsequent NLP piped processes (POS-tagging, WSD). In this paper we claim that to obtain this quality we need to use in the tokenization disambiguation process all linguistic, morphosyntactic, and semantic-level word-related information as necessary. We also claim that semantic disambiguation performs much better in a bilingual context than in a monolingual one. Then we prove that for the disambiguation purposes the bilingual text provided by high profile on-line machine translation services performs almost to the same level with human-origina...
In the past few decades machine translation research has made major progress. A researcher now has a...
Part-of-speech (POS) tagging is one of the most basic and crucial tasks in Natural Language Processi...
Dedicated to my loving parents and to J. (Automatic) machine translation (MT) is one of the most cha...
When comparing different tools in the field of natural language processing (NLP), the quality of the...
Training a statistical machine translation system starts with tokenizing a parallel corpus. Some lan...
In this paper we present an experiment to automatically generate annotated training corpora for a su...
International audienceWord alignments identify translational correspondences between words in a para...
[Abstract] One of the most important prior tasks for robust part-of-speech tagging is the correct to...
The traditional approach to information retrieval is based on using words as the indexing and search...
Technical-term translation represents one of the most difficult tasks for human translators since (1...
In this paper we apply distributional semantic information to document-level machine translation. We...
We use bilingual lexicon induction techniques, which learn translations from monolin-gual texts in t...
We present our semantic textual similarity approach in filtering a noisy web crawled parallel corpus...
We present the first ever results show-ing that tuning a machine translation sys-tem against a seman...
This article proposes a new method for word translation disambiguation, one that uses a machine-lear...
In the past few decades machine translation research has made major progress. A researcher now has a...
Part-of-speech (POS) tagging is one of the most basic and crucial tasks in Natural Language Processi...
Dedicated to my loving parents and to J. (Automatic) machine translation (MT) is one of the most cha...
When comparing different tools in the field of natural language processing (NLP), the quality of the...
Training a statistical machine translation system starts with tokenizing a parallel corpus. Some lan...
In this paper we present an experiment to automatically generate annotated training corpora for a su...
International audienceWord alignments identify translational correspondences between words in a para...
[Abstract] One of the most important prior tasks for robust part-of-speech tagging is the correct to...
The traditional approach to information retrieval is based on using words as the indexing and search...
Technical-term translation represents one of the most difficult tasks for human translators since (1...
In this paper we apply distributional semantic information to document-level machine translation. We...
We use bilingual lexicon induction techniques, which learn translations from monolin-gual texts in t...
We present our semantic textual similarity approach in filtering a noisy web crawled parallel corpus...
We present the first ever results show-ing that tuning a machine translation sys-tem against a seman...
This article proposes a new method for word translation disambiguation, one that uses a machine-lear...
In the past few decades machine translation research has made major progress. A researcher now has a...
Part-of-speech (POS) tagging is one of the most basic and crucial tasks in Natural Language Processi...
Dedicated to my loving parents and to J. (Automatic) machine translation (MT) is one of the most cha...