We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function. In other words, we can observe the contributions of different network components at training time. In this article, we systematically study masked language modeling, causal language modeling, and machine translation. We show that the choice of training objective leads to distinctive optimization procedures, even when performed on comparable Transformer architectures. We demonstrate how the various Transformer parameters are used during training, supporting that the feed-forward components of each layer are the main contribu...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
In Neural Machine Translation (and, more generally, conditional language modeling), the generation o...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Recent progress in neural machine translation is directed towards larger neural networks trained on ...
This article describes our experiments in neural machine translation using the recent Tensor2Tensor ...
The advancement of neural network models has led to state-of-the-art performance in a wide range of ...
Neural machine translation is known to require large numbers of parallel training sentences, which g...
The competitive performance of neural machine translation (NMT) critically relies on large amounts o...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
We explore the roles and interactions of the hyper-parameters governing regularization, and propose ...
Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation ...
Why do artificial neural networks model language so well? We claim that in order to answer this ques...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
In Neural Machine Translation (and, more generally, conditional language modeling), the generation o...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Recent progress in neural machine translation is directed towards larger neural networks trained on ...
This article describes our experiments in neural machine translation using the recent Tensor2Tensor ...
The advancement of neural network models has led to state-of-the-art performance in a wide range of ...
Neural machine translation is known to require large numbers of parallel training sentences, which g...
The competitive performance of neural machine translation (NMT) critically relies on large amounts o...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
We explore the roles and interactions of the hyper-parameters governing regularization, and propose ...
Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation ...
Why do artificial neural networks model language so well? We claim that in order to answer this ques...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...