The fuzziness of Chinese sentence boundary makes discourse analysis more challenging. Moreover, many articles posted on the Internet are even lack of punctuation marks. In this pa-per, we collect documents written by masters as a reference corpus and propose a model to label the punctuation marks for the given text. Conditional random field (CRF) models trained with the corpus determine the correct delimiter (a comma or a full-stop) between each pair of successive clauses. Different tag-ging schemes and various features from differ-ent linguistic levels are explored. The results show that our segmenter achieves an accuracy of 77.48 % for plain text, which is close to the human performance 81.18%. For the rich for-matted text, our segmenter ...
Recently, natural language processing tasks are more frequently conducted over online content. This ...
Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent proce...
In this paper, we proposed a Chinese word segmentation model for micro-blog text. Alt-hough Conditio...
This paper presents a Chinese word segmentation system submitted to the closed training evaluations ...
In this paper, we explore the use of prosodic features in sen-tence boundary detection in Chinese br...
We describe a method for disambiguating Chi-nese commas that is central to Chinese sen-tence segment...
The fact that words are not conventionally demarcated in Chinese orthography makes the process of wo...
The notion of sentencehood in Mandarin Chinese is much less well-defined than in many other language...
Sentence segmentation is a fundamental issue in Classical Chinese language processing. To facilitate...
The Chinese language is written without using spaces or other word delimiters. Although a text may b...
We describe models of prosodic phrasing trained on multiple languages to identify boundaries in an u...
The technique of multi-level chunking has been applied to full sentence parsing in a number of previ...
There is rich knowledge encoded in on-line web data. For example, punctua-tion and entity tags in Wi...
Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent proce...
Abstract The present study examined the use of statistical cues for word boundaries during Chinese r...
Recently, natural language processing tasks are more frequently conducted over online content. This ...
Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent proce...
In this paper, we proposed a Chinese word segmentation model for micro-blog text. Alt-hough Conditio...
This paper presents a Chinese word segmentation system submitted to the closed training evaluations ...
In this paper, we explore the use of prosodic features in sen-tence boundary detection in Chinese br...
We describe a method for disambiguating Chi-nese commas that is central to Chinese sen-tence segment...
The fact that words are not conventionally demarcated in Chinese orthography makes the process of wo...
The notion of sentencehood in Mandarin Chinese is much less well-defined than in many other language...
Sentence segmentation is a fundamental issue in Classical Chinese language processing. To facilitate...
The Chinese language is written without using spaces or other word delimiters. Although a text may b...
We describe models of prosodic phrasing trained on multiple languages to identify boundaries in an u...
The technique of multi-level chunking has been applied to full sentence parsing in a number of previ...
There is rich knowledge encoded in on-line web data. For example, punctua-tion and entity tags in Wi...
Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent proce...
Abstract The present study examined the use of statistical cues for word boundaries during Chinese r...
Recently, natural language processing tasks are more frequently conducted over online content. This ...
Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent proce...
In this paper, we proposed a Chinese word segmentation model for micro-blog text. Alt-hough Conditio...