Kenneweg P, Schulz A, Schroeder S, Hammer B. Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers. In: Yin H, Camacho D, Tino P, eds. Intelligent Data Engineering and Automated Learning – IDEAL 2022. 23rd International Conference, IDEAL 2022, Manchester, UK, November 24–26, 2022, Proceedings. Lecture Notes in Computer Science. Vol 13756. Cham: Springer International Publishing; 2022: 252-261.Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the c...
Comunicació presentada a: 35th International Conference on Machine Learning, celebrat a Stockholmsmä...
Transfer learning with deep convolutional neural networks significantly reduces the computation and ...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
One of the most important and ubiquitous building blocks of machine learning is gradient based optim...
Catastrophic Forgetting is a behavior seen in artificial neural networks (ANNs) when new information...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
Neural networks have had many great successes in recent years, particularly with the advent of deep ...
Deep neural networks are used in many state-of-the-art systems for machine perception. Once a networ...
The problem of catastrophic forgetting manifested itself in models of neural networks based on the c...
International audienceIn this paper, we study in what extent neural ranking models catastrophically ...
The advancement of neural network models has led to state-of-the-art performance in a wide range of ...
Continuous learning occurs naturally in human beings. However, Deep Learning methods suffer from a p...
We analyze the learning dynamics of neural language and translation models using Loss Change Allocat...
Learning often requires splitting continuous signals into recurring units, such as the discrete word...
Reinforcement learning (RL) problems are a fundamental part of machine learning theory, and neural n...
Comunicació presentada a: 35th International Conference on Machine Learning, celebrat a Stockholmsmä...
Transfer learning with deep convolutional neural networks significantly reduces the computation and ...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
One of the most important and ubiquitous building blocks of machine learning is gradient based optim...
Catastrophic Forgetting is a behavior seen in artificial neural networks (ANNs) when new information...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
Neural networks have had many great successes in recent years, particularly with the advent of deep ...
Deep neural networks are used in many state-of-the-art systems for machine perception. Once a networ...
The problem of catastrophic forgetting manifested itself in models of neural networks based on the c...
International audienceIn this paper, we study in what extent neural ranking models catastrophically ...
The advancement of neural network models has led to state-of-the-art performance in a wide range of ...
Continuous learning occurs naturally in human beings. However, Deep Learning methods suffer from a p...
We analyze the learning dynamics of neural language and translation models using Loss Change Allocat...
Learning often requires splitting continuous signals into recurring units, such as the discrete word...
Reinforcement learning (RL) problems are a fundamental part of machine learning theory, and neural n...
Comunicació presentada a: 35th International Conference on Machine Learning, celebrat a Stockholmsmä...
Transfer learning with deep convolutional neural networks significantly reduces the computation and ...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...