The recent increase in data and model scale for language model pre-training has led to huge training costs. In scenarios where new data become available over time, updating a model instead of fully retraining it would therefore provide significant gains. In this paper, we study the benefits and downsides of updating a language model when new data comes from new languages - the case of continual learning under language shift. Starting from a monolingual English language model, we incrementally add data from Norwegian and Icelandic to investigate how forward and backward transfer effects depend on the pre-training order and characteristics of languages, for different model sizes and learning rate schedulers. Our results show that, while forwa...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
In real scenarios, a multilingual model trained to solve NLP tasks on a set of languages can be requ...
Computer modeling techniques, when applied to language acquisition problems, give an often unrealiz...
International audienceKnowledge transfer between neural language models is a widely used technique t...
Recent work on large language models relies on the intuition that most natural language processing t...
Continual learning (CL) is a setting in which a model learns from a stream of incoming data while av...
Multilingual speech recognition with neural networks is often implemented with batch-learning, when ...
For many (minority) languages, the resources needed to train large models are not available. We inve...
The longstanding goal of multi-lingual learning has been to develop a universal cross-lingual model ...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Recent work has shown that neural models canbe successfully trained on multiple languagessimultaneou...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
Transfer learning from large language models (LLMs) has emerged as a powerful technique to enable kn...
The current generation of neural network-based natural language processing models excels at learning...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
In real scenarios, a multilingual model trained to solve NLP tasks on a set of languages can be requ...
Computer modeling techniques, when applied to language acquisition problems, give an often unrealiz...
International audienceKnowledge transfer between neural language models is a widely used technique t...
Recent work on large language models relies on the intuition that most natural language processing t...
Continual learning (CL) is a setting in which a model learns from a stream of incoming data while av...
Multilingual speech recognition with neural networks is often implemented with batch-learning, when ...
For many (minority) languages, the resources needed to train large models are not available. We inve...
The longstanding goal of multi-lingual learning has been to develop a universal cross-lingual model ...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Recent work has shown that neural models canbe successfully trained on multiple languagessimultaneou...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
Transfer learning from large language models (LLMs) has emerged as a powerful technique to enable kn...
The current generation of neural network-based natural language processing models excels at learning...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
In real scenarios, a multilingual model trained to solve NLP tasks on a set of languages can be requ...
Computer modeling techniques, when applied to language acquisition problems, give an often unrealiz...