International audienceIn this article we deal with the text segmentation problem in statistical language modeling for under-resourced languages with a writing system without word boundary delimiters. While the lack of text resources has a negative impact on the performance of language models, the errors introduced by the automatic word segmentation makes those data even less usable. To better exploit the text resources, we propose a method based on weighted finite state transducers to estimate the N-gram language model from the training corpus on which each sentence is segmented in multiple ways instead of a unique seg-mentation. The multiple segmentation generates more N-grams from the training corpus and allows obtaining the N-grams not f...
Ce travail de thèse porte sur la reconnaissance automatique de la parole des langues peu dotées et a...
We tackle the previously unaddressed problem of unsupervised determination of the optimal morphologi...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
International audienceIn this article we deal with the text segmentation problem in statistical lang...
International audienceIn this article we deal with the text segmentation problem in statistical lang...
This PhD thesis focuses on the problems encountered when developing automatic speech recognition for...
Language modeling is a vast sub-field of natural language processing and this work focuses on solvin...
Language modeling is a vast sub-field of natural language processing and this work focuses on solvin...
. This paper introduces a new statistical approach to automatically partitioning text into coherent ...
. This paper introduces a new statistical approach to automatically partitioning text into coherent ...
International audienceThis paper describes an extension of the n-gram language model: the similar n-...
This paper introduces a new statistical approach to partitioning text automatically into coherent se...
In domains with insufficient matched training data, language models are often constructed by interpo...
This paper introduces a new statistical approach to partitioning text automatically into coherent se...
In domains with insufficient matched training data, language models are often constructed by interpo...
Ce travail de thèse porte sur la reconnaissance automatique de la parole des langues peu dotées et a...
We tackle the previously unaddressed problem of unsupervised determination of the optimal morphologi...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...
International audienceIn this article we deal with the text segmentation problem in statistical lang...
International audienceIn this article we deal with the text segmentation problem in statistical lang...
This PhD thesis focuses on the problems encountered when developing automatic speech recognition for...
Language modeling is a vast sub-field of natural language processing and this work focuses on solvin...
Language modeling is a vast sub-field of natural language processing and this work focuses on solvin...
. This paper introduces a new statistical approach to automatically partitioning text into coherent ...
. This paper introduces a new statistical approach to automatically partitioning text into coherent ...
International audienceThis paper describes an extension of the n-gram language model: the similar n-...
This paper introduces a new statistical approach to partitioning text automatically into coherent se...
In domains with insufficient matched training data, language models are often constructed by interpo...
This paper introduces a new statistical approach to partitioning text automatically into coherent se...
In domains with insufficient matched training data, language models are often constructed by interpo...
Ce travail de thèse porte sur la reconnaissance automatique de la parole des langues peu dotées et a...
We tackle the previously unaddressed problem of unsupervised determination of the optimal morphologi...
In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) i...