We propose utilizing n-best reranking to enhance the Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we explore hypotheses beyond the top-1 to acquire more accurate pseudo-labels. To accomplish this, we leverage a diverse set of models with different inductive biases, objective functions or architectures, including publicly-available large pretrained models. The effectiveness of our proposal is validated through experiments on the WMT'21 German-English and Chinese-English translation tasks. Our results demonstrate that utilizing the pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model. In fact, our best student model achieves comparable accuracy to a large translation model fro...
Deep learning for Information Retrieval (IR) requires a large amount of high-quality query-document ...
Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great...
Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teach...
We consider language modelling (LM) as a multi-label structured prediction task by re-framing traini...
�� 2021 The Authors. Published by ACL. This is an open access article available under a Creative Com...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Knowledge distillation (KD) is the process of transferring knowledge from a large model to a small o...
In this thesis, we discuss two issues in the learning to rank area, choosing effective objective lo...
Abstract We propose a new framework for N-best reranking on sparse feature sets. The idea is to refo...
Statistical machine translation (SMT) is a method of translating from one natural language (NL) to a...
Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) t...
Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero...
This paper describes the application of discrim-inative reranking techniques to the problem of machi...
Most statistical machine translation systems take advantage of a re-scoring step that is applied on ...
Deep learning for Information Retrieval (IR) requires a large amount of high-quality query-document ...
Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great...
Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teach...
We consider language modelling (LM) as a multi-label structured prediction task by re-framing traini...
�� 2021 The Authors. Published by ACL. This is an open access article available under a Creative Com...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Knowledge distillation (KD) is the process of transferring knowledge from a large model to a small o...
In this thesis, we discuss two issues in the learning to rank area, choosing effective objective lo...
Abstract We propose a new framework for N-best reranking on sparse feature sets. The idea is to refo...
Statistical machine translation (SMT) is a method of translating from one natural language (NL) to a...
Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) t...
Recent studies have demonstrated the great potential of Large Language Models (LLMs) serving as zero...
This paper describes the application of discrim-inative reranking techniques to the problem of machi...
Most statistical machine translation systems take advantage of a re-scoring step that is applied on ...
Deep learning for Information Retrieval (IR) requires a large amount of high-quality query-document ...
Even though the rise of the Neural Machine Translation (NMT) paradigm has brought a great...
Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teach...