We address the problem of statistical language modeling in the context of PinYin to Chinese (PTC) conversion, a similar problem to speech recognition but without acoustic recognition step. Inputted phonetic syllables were first segmented and converted into word lattice, which was then scored within a Source-Channel framework in order to find the most probable Chinese sentence. In particular, we discuss the use of a Whole Sentence Maximum Entropy (WSME) model, an expressive framework for constructing language models with diverse features. Experiment showed WSME model trained with d2-ngrams and word triggers achieved a 20% reduction in perplexity and a 11.05% reduction in character conversion error over a baseline trigram
The paper introduces a rough set technique for solving the problem of mining Pinyin-to-character (PT...
[[abstract]]This study investigates language modeling for Mandarin continuous speech recognition. Fi...
An N2gram Chinese language model incorporating linguistic rules is presented. By constructing elemen...
As the growth of exchange activities between four regions of cross strait, the problem to correctly ...
The Pinyin-to-Character Conversion task is the core process of the Chinese pinyin-based input method...
Abstract. As the growth of exchange activities between four regions of cross strait, the problem to ...
This paper proposes a novel method integrating multi-level linguistic knowledge for Chinese grapheme...
[[abstract]]Statistical language modeling, which aims to capture the regularities in human natural l...
Parsing, the task of identifying syntactic components, e.g., noun and verb phrases, in a sentence, i...
The conventional n-gram language model exploits only the immediate context of historical words witho...
We propose a new goal for constructing a Chinese phoneme-to-character automatic conversion system. I...
Grapheme-to-phoneme (G2P) conversion is a very important component in a Text-to-Speech (TTS) system....
Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical a...
Parsing, the task of identifying the syntactic components, e.g., noun and verb phrases, in a sentenc...
We present the first known result for named entity recognition (NER) in realistic largevocabulary sp...
The paper introduces a rough set technique for solving the problem of mining Pinyin-to-character (PT...
[[abstract]]This study investigates language modeling for Mandarin continuous speech recognition. Fi...
An N2gram Chinese language model incorporating linguistic rules is presented. By constructing elemen...
As the growth of exchange activities between four regions of cross strait, the problem to correctly ...
The Pinyin-to-Character Conversion task is the core process of the Chinese pinyin-based input method...
Abstract. As the growth of exchange activities between four regions of cross strait, the problem to ...
This paper proposes a novel method integrating multi-level linguistic knowledge for Chinese grapheme...
[[abstract]]Statistical language modeling, which aims to capture the regularities in human natural l...
Parsing, the task of identifying syntactic components, e.g., noun and verb phrases, in a sentence, i...
The conventional n-gram language model exploits only the immediate context of historical words witho...
We propose a new goal for constructing a Chinese phoneme-to-character automatic conversion system. I...
Grapheme-to-phoneme (G2P) conversion is a very important component in a Text-to-Speech (TTS) system....
Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical a...
Parsing, the task of identifying the syntactic components, e.g., noun and verb phrases, in a sentenc...
We present the first known result for named entity recognition (NER) in realistic largevocabulary sp...
The paper introduces a rough set technique for solving the problem of mining Pinyin-to-character (PT...
[[abstract]]This study investigates language modeling for Mandarin continuous speech recognition. Fi...
An N2gram Chinese language model incorporating linguistic rules is presented. By constructing elemen...