The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in Singapore. The current version of NTU-MC contains a total of ~375,000 words (15,096 sentences) for the NTU-MC in 6 languages (English, Chinese, Japanese, Korean, Indonesian and Vietnamese) from 6 language families (Indo-European, Japonic, Austro-Asiatic, Sino-Tibetan, Austronesian and Korean as a language isolate); all text in English, Chinese, Japanese, Korean and Vietnamese were Part Of Speech (POS) tagged. This project focuses on compiling the foundation text for the NTU-MC and this dissertation describes the motivations, the corpus compilation process and internal and cross-corpora evaluation of the corpus output. The corpus will be made ...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
[[abstract]]Here, we describe an efficient algorithm to select phonetically balanced scripts for col...
This paper first discusses standards for developing Asian language corpora so as to facilitate inter...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung Univer...
In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialec...
Today, corpus plays an important role in development and evaluation language and speech technologies...
This paper describes the acquisition, preparation and properties of a corpus extracted from the offi...
In the development of language technologies such as machine translation, speech recognition, and oth...
In the natural language processing (NLP), the multilingual corpus is a necessary resource. The quali...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
A corpus is a systematic collection of natural language data in computerized fonnat. The\ud availabi...
research project included constructing a 500,000 word English-Malay parallel corpus of legal texts, ...
The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign l...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
[[abstract]]Here, we describe an efficient algorithm to select phonetically balanced scripts for col...
This paper first discusses standards for developing Asian language corpora so as to facilitate inter...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in ...
The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung Univer...
In Taiwan, most people speak Mandarin, Southern Min, or Hakka. Not only are the three Chinese dialec...
Today, corpus plays an important role in development and evaluation language and speech technologies...
This paper describes the acquisition, preparation and properties of a corpus extracted from the offi...
In the development of language technologies such as machine translation, speech recognition, and oth...
In the natural language processing (NLP), the multilingual corpus is a necessary resource. The quali...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
A corpus is a systematic collection of natural language data in computerized fonnat. The\ud availabi...
research project included constructing a 500,000 word English-Malay parallel corpus of legal texts, ...
The Multilingual Student Translation (MUST) corpus is a corpus of translations produced by foreign l...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
[[abstract]]Here, we describe an efficient algorithm to select phonetically balanced scripts for col...
This paper first discusses standards for developing Asian language corpora so as to facilitate inter...