We have collected English-Tamil bilingual data from some of the publicly available websites for NLP research involving Tamil. The standard set of processing has been applied on the the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpora cover texts from bible, cinema and news domains
The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. Th...
This paper describes our submission for the English-Tamil news translation task of WMT-2020. The var...
The paper is about developing an aligned English-Myanmar parallel corpus. This paper will describe m...
EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websi...
English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A part...
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation. ...
In this paper we present several parallel corpora for English↔Hindi and talk about their natures and...
A parallel corpus aligned at both sentence and word level is an important prerequisite in statistica...
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular ...
Word alignment in bilingual corpora has been a very active research topic in the Machine Translation...
HindEnCorp parallel texts (sentence-aligned) come from the following sources: Tides, which contains...
Various experiments from literature suggest that in statistical machine translation (SMT), applying ...
The corpus based techniques in Machine Translation involves parallel corpora, but it is not applicab...
Parallel corpus is a critical resource in machine learning based translation. The task of collecting...
Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpo...
The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. Th...
This paper describes our submission for the English-Tamil news translation task of WMT-2020. The var...
The paper is about developing an aligned English-Myanmar parallel corpus. This paper will describe m...
EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websi...
English-Hindi parallel corpus collected from several sources. Tokenized and sentence-aligned. A part...
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation. ...
In this paper we present several parallel corpora for English↔Hindi and talk about their natures and...
A parallel corpus aligned at both sentence and word level is an important prerequisite in statistica...
Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular ...
Word alignment in bilingual corpora has been a very active research topic in the Machine Translation...
HindEnCorp parallel texts (sentence-aligned) come from the following sources: Tides, which contains...
Various experiments from literature suggest that in statistical machine translation (SMT), applying ...
The corpus based techniques in Machine Translation involves parallel corpora, but it is not applicab...
Parallel corpus is a critical resource in machine learning based translation. The task of collecting...
Recent work has established the efficacy of Amazon’s Mechanical Turk for constructing parallel corpo...
The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. Th...
This paper describes our submission for the English-Tamil news translation task of WMT-2020. The var...
The paper is about developing an aligned English-Myanmar parallel corpus. This paper will describe m...