This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content ...
language modeling) approaches to information retrieval. Language modeling is a formal probabilistic ...
We propose a topic based approach to language modelling for ad-hoc Information Retrieval (IR). Many ...
Statistical language models used in large-vocabulary speech recognition must properly encapsulate th...
This paper describes an approach for constructing a mixture of language models based on simple stati...
In this paper, an approach for constructing mixture language models (LMs) based on some notion of se...
Topic modeling demonstrates the semantic relations among words, which should be helpful for informat...
In today\u27s world, there is no shortage of information. However, for a specific information need, ...
Statistical language models encapsulate varied information, both grammatical and semantic, present i...
International audienceA new statistical method for Language Modeling and spoken document classificat...
International audienceA new statistical method for Language Modeling and spoken document classificat...
Probabilistic topic models are unsupervised generative models which model document content as a two-...
International audienceMost existing Information Retrieval model including probabilistic and vector s...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
In this thesis, we investigate the use of a probabilistic model for unsupervised clustering of text ...
In this thesis, we investigate the use of a probabilistic model for unsupervised clustering of text ...
language modeling) approaches to information retrieval. Language modeling is a formal probabilistic ...
We propose a topic based approach to language modelling for ad-hoc Information Retrieval (IR). Many ...
Statistical language models used in large-vocabulary speech recognition must properly encapsulate th...
This paper describes an approach for constructing a mixture of language models based on simple stati...
In this paper, an approach for constructing mixture language models (LMs) based on some notion of se...
Topic modeling demonstrates the semantic relations among words, which should be helpful for informat...
In today\u27s world, there is no shortage of information. However, for a specific information need, ...
Statistical language models encapsulate varied information, both grammatical and semantic, present i...
International audienceA new statistical method for Language Modeling and spoken document classificat...
International audienceA new statistical method for Language Modeling and spoken document classificat...
Probabilistic topic models are unsupervised generative models which model document content as a two-...
International audienceMost existing Information Retrieval model including probabilistic and vector s...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
In this thesis, we investigate the use of a probabilistic model for unsupervised clustering of text ...
In this thesis, we investigate the use of a probabilistic model for unsupervised clustering of text ...
language modeling) approaches to information retrieval. Language modeling is a formal probabilistic ...
We propose a topic based approach to language modelling for ad-hoc Information Retrieval (IR). Many ...
Statistical language models used in large-vocabulary speech recognition must properly encapsulate th...