In this paper we show that clustering alphabet symbols before PDFA inference is performed reduces perplexity on new data. This result is especially important in real tasks, such as spoken language interfaces, in which data sparseness is a s ignificant issue. We describe the application of the ALERGIA algorithm combined with an independent clustering technique to the Air Travel Information System (A TIS) task. A 25 % reduction in perplexity was obtained. This result outperforms a trigram model under the same simple smoothing scheme
Probabilistic DFA inference is the problem of inducing a stochastic regular grammar from a positive...
Codebook-based feature encodings are a standard framework for image recognition issues. A codebook i...
Text clustering is an established technique for improving quality in information retrieval, for both...
In this paper, we propose a way of incorporating additional knowledge in probabilistic automata infe...
The paper presents a scalable method for learning probabilistic real-time automata (PRTAs), a new ty...
International audienceApplications of probabilistic grammatical inference are limited due to time an...
International audienceIn this paper, we aim at correcting distributions of noisy samples in order to...
Known algorithms for learning PDFA can only be shown to run in time polynomial in the so-called dis...
Automatic speech recognition has matured into a commercially successful technology, enabling voice-b...
International audienceAngluin's L* algorithm learns the minimal (complete) deterministic finite auto...
We propose a new method to improve the accuracy of Text Categorization using Two-dimensional Cluster...
First we propose a reformulation of the Integer Linear Pro-gramming (ILP) clustering method we intro...
The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the...
We propose a new type of undirected graphical models called a Combinatorial Markov Random Field (Com...
Abstract—We study the problem of clustering uncertain objects whose locations are described by proba...
Probabilistic DFA inference is the problem of inducing a stochastic regular grammar from a positive...
Codebook-based feature encodings are a standard framework for image recognition issues. A codebook i...
Text clustering is an established technique for improving quality in information retrieval, for both...
In this paper, we propose a way of incorporating additional knowledge in probabilistic automata infe...
The paper presents a scalable method for learning probabilistic real-time automata (PRTAs), a new ty...
International audienceApplications of probabilistic grammatical inference are limited due to time an...
International audienceIn this paper, we aim at correcting distributions of noisy samples in order to...
Known algorithms for learning PDFA can only be shown to run in time polynomial in the so-called dis...
Automatic speech recognition has matured into a commercially successful technology, enabling voice-b...
International audienceAngluin's L* algorithm learns the minimal (complete) deterministic finite auto...
We propose a new method to improve the accuracy of Text Categorization using Two-dimensional Cluster...
First we propose a reformulation of the Integer Linear Pro-gramming (ILP) clustering method we intro...
The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the...
We propose a new type of undirected graphical models called a Combinatorial Markov Random Field (Com...
Abstract—We study the problem of clustering uncertain objects whose locations are described by proba...
Probabilistic DFA inference is the problem of inducing a stochastic regular grammar from a positive...
Codebook-based feature encodings are a standard framework for image recognition issues. A codebook i...
Text clustering is an established technique for improving quality in information retrieval, for both...