Recent advances in Transformer-based Large Language Models have made great strides in natural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may be a performance bottleneck for large language models. Many non-autoregressive (NAR) research are aiming to address this sequentiality bottleneck, albeit many have focused on a dedicated architecture in supervised benchmarks. In this work, we studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising and shown its SoTA results in downstream generation tasks such as SQuAD question generation and XSum
Deep generative models of text have shown great success on a wide range of conditional and unconditi...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequenc...
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to ...
The advances in deep learning have led to great achievements in many Natural Language Processing (NL...
Non-autoregressive generation (NAG) has recently attracted great attention due to its fast inference...
Encoder-decoder transformer architectures have become popular recently with the advent of T5 models....
Non-autoregressive approaches aim to improve the inference speed of translation models, particularly...
Non-Autoregressive generation is a sequence generation paradigm, which removes the dependency betwee...
Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT)...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Long samples of text from neural language models can be of poor quality. Truncation sampling algorit...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Existing large language models have to run K times to generate a sequence of K tokens. In this paper...
Deep generative models of text have shown great success on a wide range of conditional and unconditi...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequenc...
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to ...
The advances in deep learning have led to great achievements in many Natural Language Processing (NL...
Non-autoregressive generation (NAG) has recently attracted great attention due to its fast inference...
Encoder-decoder transformer architectures have become popular recently with the advent of T5 models....
Non-autoregressive approaches aim to improve the inference speed of translation models, particularly...
Non-Autoregressive generation is a sequence generation paradigm, which removes the dependency betwee...
Pre-trained sequence-to-sequence models have significantly improved Neural Machine Translation (NMT)...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Long samples of text from neural language models can be of poor quality. Truncation sampling algorit...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Existing large language models have to run K times to generate a sequence of K tokens. In this paper...
Deep generative models of text have shown great success on a wide range of conditional and unconditi...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequenc...