Simple and Efficient Model Filtering in Statistical Machine Translation Data availability and distributed computing techniques have allowed statistical machine translation (SMT) researchers to build larger models. However, decoders need to be able to retrieve information efficiently from these models to be able to translate an input sentence or a set of input sentences. We introduce an easy to implement and general purpose solution to tackle this problem: we store SMT models as a set of key-value pairs in an HFile. We apply this strategy to two specific tasks: test set hierarchical phrase-based rule filtering and n-gram count filtering for language model lattice rescoring. We compare our approach to alternative strategies and show that its ...
In this paper, we start with the existing idea of taking reordering rules automatically derived from...
Language modeling is an important part for both speech recognition and machine translation systems. ...
Independence between sentences is an assumption deeply entrenched in the models and algorithms used ...
Abstract Data availability and distributed computing techniques have allowed statistical machine tra...
Statistical Machine Translation (SMT) is an evolving field where many techniques in Syntactic Patter...
This thesis develops a robust inventory of large-scale lattice rescoring methods that improve the qu...
A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirement...
Statistical machine translation, the task of translating text from one natural language into another...
2014-07-28The goal of machine translation is to translate from one natural language into another usi...
Statistical machine translation, as well as other areas of human language processing, have recentl...
N-gram language models are an essential component in statistical natural language processing systems...
© 2014 Association for Computational Linguistics. The combinatorial space of translation derivations...
In this paper, we start with the existing idea of taking reordering rules automatically derived from...
In this paper, we start with the existing idea of taking reordering rules automatically derived from...
In current phrase-based SMT systems, more training data is generally better than less. However, a la...
In this paper, we start with the existing idea of taking reordering rules automatically derived from...
Language modeling is an important part for both speech recognition and machine translation systems. ...
Independence between sentences is an assumption deeply entrenched in the models and algorithms used ...
Abstract Data availability and distributed computing techniques have allowed statistical machine tra...
Statistical Machine Translation (SMT) is an evolving field where many techniques in Syntactic Patter...
This thesis develops a robust inventory of large-scale lattice rescoring methods that improve the qu...
A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirement...
Statistical machine translation, the task of translating text from one natural language into another...
2014-07-28The goal of machine translation is to translate from one natural language into another usi...
Statistical machine translation, as well as other areas of human language processing, have recentl...
N-gram language models are an essential component in statistical natural language processing systems...
© 2014 Association for Computational Linguistics. The combinatorial space of translation derivations...
In this paper, we start with the existing idea of taking reordering rules automatically derived from...
In this paper, we start with the existing idea of taking reordering rules automatically derived from...
In current phrase-based SMT systems, more training data is generally better than less. However, a la...
In this paper, we start with the existing idea of taking reordering rules automatically derived from...
Language modeling is an important part for both speech recognition and machine translation systems. ...
Independence between sentences is an assumption deeply entrenched in the models and algorithms used ...