Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabili-ties of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lower-order entries in an N-gram model to score the first few words of a fragment; this vio-lates assumptions made by common smooth-ing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This im-proves search at the expense of memory. Con-versely, we show how to save memory by col-lapsing probability and backoff into a single value without changing sentence-level score...
Approximate sentence matching (ASM) is an important technique for tasks in machine translation (MT) ...
Thesis: S.M., Harvard-MIT Program in Health Sciences and Technology, 2014.Cataloged from PDF version...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...
N-gram language models are an essential component in statistical natural language processing systems...
Many syntactic machine translation decoders, including Moses, cdec, and Joshua, implement bottom-up ...
Many syntactic machine translation decoders, including Moses, cdec, and Joshua, implement bottom-up ...
We examine the ability of several mod-els of computation and storage to explain reading time data. S...
We examine the ability of several mod-els of computation and storage to explain reading time data. S...
This paper deals with the two fundamental problems concerning the handling of large n-gram language ...
In this paper, we compare the relative effects of segment order, segmentation and segment contiguity...
This paper describes Grammar Learning by Partition Search, a general method for automatically constr...
Efficient methods for storing and querying are critical for scaling high-order m-gram language model...
Contains fulltext : 73393.pdf (publisher's version ) (Open Access)Language models ...
We study the impact of source length and verbosity of the tuning dataset on the per-formance of para...
We contribute a faster decoding algo-rithm for phrase-based machine transla-tion. Translation hypoth...
Approximate sentence matching (ASM) is an important technique for tasks in machine translation (MT) ...
Thesis: S.M., Harvard-MIT Program in Health Sciences and Technology, 2014.Cataloged from PDF version...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...
N-gram language models are an essential component in statistical natural language processing systems...
Many syntactic machine translation decoders, including Moses, cdec, and Joshua, implement bottom-up ...
Many syntactic machine translation decoders, including Moses, cdec, and Joshua, implement bottom-up ...
We examine the ability of several mod-els of computation and storage to explain reading time data. S...
We examine the ability of several mod-els of computation and storage to explain reading time data. S...
This paper deals with the two fundamental problems concerning the handling of large n-gram language ...
In this paper, we compare the relative effects of segment order, segmentation and segment contiguity...
This paper describes Grammar Learning by Partition Search, a general method for automatically constr...
Efficient methods for storing and querying are critical for scaling high-order m-gram language model...
Contains fulltext : 73393.pdf (publisher's version ) (Open Access)Language models ...
We study the impact of source length and verbosity of the tuning dataset on the per-formance of para...
We contribute a faster decoding algo-rithm for phrase-based machine transla-tion. Translation hypoth...
Approximate sentence matching (ASM) is an important technique for tasks in machine translation (MT) ...
Thesis: S.M., Harvard-MIT Program in Health Sciences and Technology, 2014.Cataloged from PDF version...
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abu...