The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in Singapore. The current version of NTU-MC contains a total of ~375,000 words (15,096 sentences) for the NTU-MC in 6 languages (English, Chinese, Japanese, Korean, Indonesian and Vietnamese) from 6 language families (Indo-European, Japonic, Austro-Asiatic, Sino-Tibetan, Austronesian and Korean as a language isolate); all text in English, Chinese, Japanese, Korean and Vietnamese were Part Of Speech (POS) tagged. This project focuses on compiling the foundation text for the NTU-MC and this dissertation describes the motivations, the corpus compilation process and internal and cross-corpora evaluation of the corpus output. The corpus will be made ...
A corpus is a systematic collection of natural language data in computerized fonnat. The\ud availabi...
The EMILLE Project (Enabling Minority Language Engineering) was established to construct a 67 millio...
Contemporary information technologies and mathematical modelling has made creating corpora of natura...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung Univer...
In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialec...
In the natural language processing (NLP), the multilingual corpus is a necessary resource. The quali...
In the development of language technologies such as machine translation, speech recognition, and oth...
The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign l...
This paper describes the acquisition, preparation and properties of a corpus extracted from the offi...
Generally, a corpus serves as the source of data for various types of research. As such, there are a...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
research project included constructing a 500,000 word English-Malay parallel corpus of legal texts, ...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
A corpus is a systematic collection of natural language data in computerized fonnat. The\ud availabi...
The EMILLE Project (Enabling Minority Language Engineering) was established to construct a 67 millio...
Contemporary information technologies and mathematical modelling has made creating corpora of natura...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung Univer...
In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialec...
In the natural language processing (NLP), the multilingual corpus is a necessary resource. The quali...
In the development of language technologies such as machine translation, speech recognition, and oth...
The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign l...
This paper describes the acquisition, preparation and properties of a corpus extracted from the offi...
Generally, a corpus serves as the source of data for various types of research. As such, there are a...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
research project included constructing a 500,000 word English-Malay parallel corpus of legal texts, ...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
A corpus is a systematic collection of natural language data in computerized fonnat. The\ud availabi...
The EMILLE Project (Enabling Minority Language Engineering) was established to construct a 67 millio...
Contemporary information technologies and mathematical modelling has made creating corpora of natura...