This paper explores lexicographic semirings and their application to problems in speech and language processing. Specifically, we present two instantiations of binary lexicographic semi-rings, one involving a pair of tropical weights, and the other a tropical weight paired with a novel string semiring we term the categorial semiring. The first of these is used to yield an exact encoding of backoff models with epsilon transitions. This lexicographic language model semiring allows for off-line optimization of exact models represented as large weighted finite-state transducers in contrast to implicit (on-line) failure transition representations. We present empirical results demonstrating that, even in simple intersection scenarios amenable to ...
Automata and Dictionaries is aimed at students and specialists in natural language processing and re...
The A* algorithm is defined in a directed graph formalism. Pruning, path merging and modification of...
Character-level models of tokens have been shown to be effective at dealing with withintoken noise a...
In this paper we introduce a novel use of the lexicographic semiring and motivate its use for speech...
We present a general algorithm, pre-determinization, that makes an arbitrary weighted transducer ove...
AbstractWe present a general algorithm, pre-determinization, that makes an arbitrary weighted transd...
The framework of document spanners abstracts the task of informationextraction from text as a functi...
This thesis demonstrates how modeling techniques from speech recognition can be advantageous in a va...
In many applications of speech and language processing, we generate intermediate results in the form...
We study properties and relationship between three classes of quantitative language models computing...
We survey the use of weighted nitestate transducers WFSTs in speech recognition We show that WFSTs...
This paper describes a joint model of word segmentation and phonological alternations, which takes u...
This paper addresses issues in part of speech disambiguation using finite-state transducers and pres...
This paper presents trainable methods for generating letter to sound rules from a given lexicon for ...
In automatic speech recognition, the confidence in a recognition result, i.e., the "degree of belief...
Automata and Dictionaries is aimed at students and specialists in natural language processing and re...
The A* algorithm is defined in a directed graph formalism. Pruning, path merging and modification of...
Character-level models of tokens have been shown to be effective at dealing with withintoken noise a...
In this paper we introduce a novel use of the lexicographic semiring and motivate its use for speech...
We present a general algorithm, pre-determinization, that makes an arbitrary weighted transducer ove...
AbstractWe present a general algorithm, pre-determinization, that makes an arbitrary weighted transd...
The framework of document spanners abstracts the task of informationextraction from text as a functi...
This thesis demonstrates how modeling techniques from speech recognition can be advantageous in a va...
In many applications of speech and language processing, we generate intermediate results in the form...
We study properties and relationship between three classes of quantitative language models computing...
We survey the use of weighted nitestate transducers WFSTs in speech recognition We show that WFSTs...
This paper describes a joint model of word segmentation and phonological alternations, which takes u...
This paper addresses issues in part of speech disambiguation using finite-state transducers and pres...
This paper presents trainable methods for generating letter to sound rules from a given lexicon for ...
In automatic speech recognition, the confidence in a recognition result, i.e., the "degree of belief...
Automata and Dictionaries is aimed at students and specialists in natural language processing and re...
The A* algorithm is defined in a directed graph formalism. Pruning, path merging and modification of...
Character-level models of tokens have been shown to be effective at dealing with withintoken noise a...