This paper describes the acquisition, preparation and properties of a corpus extracted from the official documents of the United Nations (UN). This corpus is available in all 6 official languages of the UN, consisting of around 300 million words per language. We describe the methods we used for crawling, document formatting, and sentence alignment. This corpus also includes a common test set for machine translation. We present the results of a French-Chinese machine translation experiment performed on this corpus. 1
We present a new, unique and freely available parallel corpus containing European Union (EU) documen...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then apply...
MultiUN is a multilingual parallel corpus extracted from the official documents of the United Nation...
In this paper we describe a six-ways parallel public-domain corpus consisting of 2100 United Nations...
International audienceThis paper describes a corpus of nearly 10K French-Chinese aligned segments, p...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
Integrating Natural Language Processing (NLP) and computer vision is a promising effort. However, th...
We present Multilingual Open Text (MOT), a new multilingual corpus containing text in 44 languages, ...
International audienceA bitext is a merged document composed of two versions of a given text, usuall...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Lang...
The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Lang...
We present a new, unique and freely available parallel corpus containing European Union (EU) documen...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then apply...
MultiUN is a multilingual parallel corpus extracted from the official documents of the United Nation...
In this paper we describe a six-ways parallel public-domain corpus consisting of 2100 United Nations...
International audienceThis paper describes a corpus of nearly 10K French-Chinese aligned segments, p...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
Integrating Natural Language Processing (NLP) and computer vision is a promising effort. However, th...
We present Multilingual Open Text (MOT), a new multilingual corpus containing text in 44 languages, ...
International audienceA bitext is a merged document composed of two versions of a given text, usuall...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Lang...
The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Lang...
We present a new, unique and freely available parallel corpus containing European Union (EU) documen...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then apply...