Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpora for machine translation re-search. We apply this to building a collec-tion of parallel corpora between English and six languages from the Indian subcontinent: Bengali, Hindi, Malayalam, Tamil, Telugu, and Urdu. These languages are low-resource, under-studied, and exhibit linguistic phenom-ena that are difficult for machine translation. We conduct a variety of baseline experiments and analysis, and release the data to the com-munity.
Abstract—Machine Translation pertains to translation of one natural language to other by using autom...
Identifying translations from comparable corpora is a well-known problem with several applications, ...
Hindi and Urdu share a common phonol-ogy, morphology and grammar but are written in different script...
In this paper we present several parallel corpora for English↔Hindi and talk about their natures and...
Key to fast adaptation of language technologies for any language hinges on the availability of funda...
Key to fast adaptation of language technologies for any language hinges on the availability of funda...
Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine...
Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine...
In recent years, the multilingual content over the internet has grown exponentially together with th...
Although there has been work in the field of machine translation for a few decades, the promising tr...
Importance of translation has been realized long way back, but mostly it was manual translation. Tra...
Importance of translation has been realized long way back, but mostly it was manual translation. Tra...
The work in the area of machine translation has been going on for last few decades but the promising...
Parallel corpora are often injected with bilingual lexical resources for improved Indian language ma...
Machine translation (MT) is a hard problem because of the highly complex, irregular and diverse natu...
Abstract—Machine Translation pertains to translation of one natural language to other by using autom...
Identifying translations from comparable corpora is a well-known problem with several applications, ...
Hindi and Urdu share a common phonol-ogy, morphology and grammar but are written in different script...
In this paper we present several parallel corpora for English↔Hindi and talk about their natures and...
Key to fast adaptation of language technologies for any language hinges on the availability of funda...
Key to fast adaptation of language technologies for any language hinges on the availability of funda...
Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine...
Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine...
In recent years, the multilingual content over the internet has grown exponentially together with th...
Although there has been work in the field of machine translation for a few decades, the promising tr...
Importance of translation has been realized long way back, but mostly it was manual translation. Tra...
Importance of translation has been realized long way back, but mostly it was manual translation. Tra...
The work in the area of machine translation has been going on for last few decades but the promising...
Parallel corpora are often injected with bilingual lexical resources for improved Indian language ma...
Machine translation (MT) is a hard problem because of the highly complex, irregular and diverse natu...
Abstract—Machine Translation pertains to translation of one natural language to other by using autom...
Identifying translations from comparable corpora is a well-known problem with several applications, ...
Hindi and Urdu share a common phonol-ogy, morphology and grammar but are written in different script...