State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others. We describe LegoNN, a procedure for building encoder-decoder architectures with decoder modules that can be reused across various MT and ASR tasks, without the need for any fine-tuning. To achieve reusability, the interface between each encoder and decoder modules is grounded to a sequence of marginal distributions over a discrete vocabulary pre-defined by the model designer. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the ...
Scaling multilingual representation learning beyond the hundred most frequent languages is challengi...
Neural encoder-decoder models for language generation can be trained to predict words directly from ...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
This thesis introduces the concept of an encoder-decoder neural network and develops architectures f...
Attention-based Encoder-Decoder has the effective architecture for neural machine translation (NMT),...
A morphologically complex word (MCW) is a hierarchical constituent with meaning-preserving subunits,...
Sharing source and target side vocabularies and word embeddings has been a popular practice in neura...
We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-effici...
Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems th...
State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requi...
Neural network language models are often trained by optimizing likelihood, but we would prefer to op...
In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a numbe...
In recent years, deep learning has enabled impressive achievements in Machine Translation.Neural Mac...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
State-of-the-art multilingual machine translation relies on a shared encoder-decoder. In this paper,...
Scaling multilingual representation learning beyond the hundred most frequent languages is challengi...
Neural encoder-decoder models for language generation can be trained to predict words directly from ...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
This thesis introduces the concept of an encoder-decoder neural network and develops architectures f...
Attention-based Encoder-Decoder has the effective architecture for neural machine translation (NMT),...
A morphologically complex word (MCW) is a hierarchical constituent with meaning-preserving subunits,...
Sharing source and target side vocabularies and word embeddings has been a popular practice in neura...
We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-effici...
Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems th...
State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requi...
Neural network language models are often trained by optimizing likelihood, but we would prefer to op...
In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a numbe...
In recent years, deep learning has enabled impressive achievements in Machine Translation.Neural Mac...
Pre-trained language models received extensive attention in recent years. However, it is still chall...
State-of-the-art multilingual machine translation relies on a shared encoder-decoder. In this paper,...
Scaling multilingual representation learning beyond the hundred most frequent languages is challengi...
Neural encoder-decoder models for language generation can be trained to predict words directly from ...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...