A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

Vazquez , Raul
Celikkanat, Hande
Ravishankar, Vinit
Creutz, Mathias
Tiedemann, Jörg

Publication date

October 2022

Abstract

We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function. In other words, we can observe the contributions of different network components at training time. In this article, we systematically study masked language modeling, causal language modeling, and machine translation. We show that the choice of training objective leads to distinctive optimization procedures, even when performed on comparable Transformer architectures. We demonstrate how the various Transformer parameters are used during training, supporting that the feed-forward components of each layer are the main contribu...

Extracted data

We use cookies to provide a better user experience.

Data Protection

A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

Abstract

Extracted data

A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

Abstract

Extracted data

Related items

Related items