In this work, we make a study on the effect of training set on statistical language modeling (SLM). A corpus selection system based on perplexity is presented. It is tested in two experiments: one is to select optimal training corpus for generating a domain-specific SLM; the other one is for generating an optimal SLM for a LVCSR system. The results show that the training corpus is important for the capability of SLM and our corpus selection system is powerful for optimal corpus selection. With the help of this system, we generated a SLM for a LVCSR system, which contributed 14.5%--17.7 % relative character error reduction. 1
Language modeling is critical and indispensable for many natural language ap-plications such as auto...
Data selection has shown significant improvements in effective use of training data by extracting se...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
Abstract. In this paper, we study selection criteria for the use of word trigger pairs in statistica...
Statistical language models (SLMs) for speech recognition have the advantage of robustness, and gram...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
International audienceStatistical Language Models (LM) are highly dependent on their training resour...
The efficacy of discriminative training in Statistical Machine Translation is heavily dependent on t...
Abstract. The language model is an important component of any speech recogn i-tion system. In this p...
Language modeling is an important part for both speech recognition and machine translation systems. ...
Target task matched parallel corpora are re-quired for statistical translation model train-ing. Howe...
This paper introduces a selection-based LM using topic modeling for the purpose of domain adaptation...
We propose and study three different novel approaches for tackling the problem of development set se...
Machine translation is the application of machines to translate text or speech from one natural lang...
AbstractThis paper explores the use of linguistic information for the selection of data to train lan...
Language modeling is critical and indispensable for many natural language ap-plications such as auto...
Data selection has shown significant improvements in effective use of training data by extracting se...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...
Abstract. In this paper, we study selection criteria for the use of word trigger pairs in statistica...
Statistical language models (SLMs) for speech recognition have the advantage of robustness, and gram...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
International audienceStatistical Language Models (LM) are highly dependent on their training resour...
The efficacy of discriminative training in Statistical Machine Translation is heavily dependent on t...
Abstract. The language model is an important component of any speech recogn i-tion system. In this p...
Language modeling is an important part for both speech recognition and machine translation systems. ...
Target task matched parallel corpora are re-quired for statistical translation model train-ing. Howe...
This paper introduces a selection-based LM using topic modeling for the purpose of domain adaptation...
We propose and study three different novel approaches for tackling the problem of development set se...
Machine translation is the application of machines to translate text or speech from one natural lang...
AbstractThis paper explores the use of linguistic information for the selection of data to train lan...
Language modeling is critical and indispensable for many natural language ap-plications such as auto...
Data selection has shown significant improvements in effective use of training data by extracting se...
This PhD thesis studies the overall effect of statistical language modeling on perplexity and word e...