We explore the roles and interactions of the hyper-parameters governing regularization, and propose a range of values applicable to low-resource neural machine translation. We demonstrate that default or recommended values for high-resource settings are not optimal for low-resource ones, and that more aggressive regularization is needed when resources are scarce, in proportion to their scarcity. We explain our observations by the generalization abilities of sharp vs. flat basins in the loss landscape of a neural network. Results for four regularization factors corroborate our claim: batch size, learning rate, dropout rate, and gradient clipping. Moreover, we show that optimal results are obtained when using several of these factors, and tha...
Most neural machine translation models are implemented as a conditional language model framework com...
<p>For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs)...
Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically req...
This paper aims to compare different regularization strategies to address a common phenomenon, sever...
We study the role of an essential hyperparameter that governs the training of Transformers for neura...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Neural machine translation (NMT) has been a mainstream method for the machine translation (MT) task....
Deep neural language models like GPT-2 is undoubtedly strong at text generation, but often requires ...
With the advent of deep neural networks in recent years, Neural Machine Translation (NMT) systems ha...
This paper aims to compare different reg-ularization strategies to address a com-mon phenomenon, sev...
We analyze the learning dynamics of neural language and translation models using Loss Change Allocat...
Neural machine translation (NMT) has become the de facto standard in the machine translation communi...
Neural networks have been shown to improve performance across a range of natural-language tasks. How...
The quality of a Neural Machine Translation system depends substantially on the availability of siza...
This paper presents a new method to reduce the computational cost when using Neural Networks as Lang...
Most neural machine translation models are implemented as a conditional language model framework com...
<p>For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs)...
Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically req...
This paper aims to compare different regularization strategies to address a common phenomenon, sever...
We study the role of an essential hyperparameter that governs the training of Transformers for neura...
Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successfu...
Neural machine translation (NMT) has been a mainstream method for the machine translation (MT) task....
Deep neural language models like GPT-2 is undoubtedly strong at text generation, but often requires ...
With the advent of deep neural networks in recent years, Neural Machine Translation (NMT) systems ha...
This paper aims to compare different reg-ularization strategies to address a com-mon phenomenon, sev...
We analyze the learning dynamics of neural language and translation models using Loss Change Allocat...
Neural machine translation (NMT) has become the de facto standard in the machine translation communi...
Neural networks have been shown to improve performance across a range of natural-language tasks. How...
The quality of a Neural Machine Translation system depends substantially on the availability of siza...
This paper presents a new method to reduce the computational cost when using Neural Networks as Lang...
Most neural machine translation models are implemented as a conditional language model framework com...
<p>For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs)...
Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically req...