With growing interest in Chinese Language Processing, numerous NLP tools (e.g. word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on the corpora with different segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are difficult. As a first step towards addressing this issue, we have been preparing a 100-thousand-word bracketed corpus since late 1998 and plan to release it to the public summer 2000. In this paper, we will address several challenges in building the corpus, namely, creating annotation guidelines, ensuring annotation accuracy and...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanc...
Textual information written in Chinese now represents a huge knowledge repository. The first step of...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g. word segmenters, part...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
This document describes the segmentation guidelines for the Penn Chinese Treebank Project. The goal ...
This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of...
This document describes the segmentation guidelines for the Penn Chinese Treebank Project. The goal ...
We address the issue of consuming heterogeneous annotation data for Chinese word segmentation and pa...
This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Chinese Treebank Pr...
Tagging as the most crucial annotation of language resources can still be challenging when the corpu...
At present most of corpora are annotated mainly with syntactic knowledge. In this paper, we attemp...
This paper presents the building procedure of a Chinese sense annotated corpus. A set of software to...
Abstract Background Chinese word segmentation (CWS) and part-of-speech (POS) tagging are two fundame...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanc...
Textual information written in Chinese now represents a huge knowledge repository. The first step of...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g. word segmenters, part...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, par...
This document describes the segmentation guidelines for the Penn Chinese Treebank Project. The goal ...
This document describes the bracketing guidelines for the Penn Chinese Treebank Project. The goal of...
This document describes the segmentation guidelines for the Penn Chinese Treebank Project. The goal ...
We address the issue of consuming heterogeneous annotation data for Chinese word segmentation and pa...
This document describes the Part-of-Speech (POS) tagging guidelines for the Penn Chinese Treebank Pr...
Tagging as the most crucial annotation of language resources can still be challenging when the corpu...
At present most of corpora are annotated mainly with syntactic knowledge. In this paper, we attemp...
This paper presents the building procedure of a Chinese sense annotated corpus. A set of software to...
Abstract Background Chinese word segmentation (CWS) and part-of-speech (POS) tagging are two fundame...
We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Cor...
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanc...
Textual information written in Chinese now represents a huge knowledge repository. The first step of...