Tagging as the most crucial annotation of language resources can still be challenging when the corpus size is big and when the corpus data is not homogeneous. The Chinese Gigaword Corpus is confounded by both challenges. The corpus contains roughly 1.12 billion Chinese characters from two heterogeneous sources: respective news in Taiwan and in Mainland China. In other words, in addition to its size, the data also contains two variants of Chinese that are known to exhibit substantial linguistic differences. We utilize Chinese Sketch Engine as the corpus query tool, by which grammar behaviours of the two heterogeneous resources could be captured and displayed in a unified web interface. In this paper, we report our answer to the two challenge...
corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firs...
Abstract. This paper describes our system designed for the NLPCC 2015 shared task on Chinese word se...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
Tagging as the most crucial annotation of language resources can still be challenging when the corpu...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g. word segmenters, part...
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical rela...
We address the issue of consuming heterogeneous annotation data for Chinese word segmentation and pa...
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanc...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Chinese Treebank Pr...
At present most of corpora are annotated mainly with syntactic knowledge. In this paper, we attemp...
Web provides a large-scale corpus for researchers to study the language usages in real world. Develo...
In recent years more and more NLP packages become available to the pub-lic, and many of them are imp...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firs...
Abstract. This paper describes our system designed for the NLPCC 2015 shared task on Chinese word se...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
Tagging as the most crucial annotation of language resources can still be challenging when the corpu...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g. word segmenters, part...
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical rela...
We address the issue of consuming heterogeneous annotation data for Chinese word segmentation and pa...
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanc...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Chinese Treebank Pr...
At present most of corpora are annotated mainly with syntactic knowledge. In this paper, we attemp...
Web provides a large-scale corpus for researchers to study the language usages in real world. Develo...
In recent years more and more NLP packages become available to the pub-lic, and many of them are imp...
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natu...
corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firs...
Abstract. This paper describes our system designed for the NLPCC 2015 shared task on Chinese word se...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...