Large language models (LLMs) have demonstrated remarkable open-domain capabilities. Traditionally, LLMs tailored for a domain are trained from scratch to excel at handling domain-specific tasks. In this work, we explore an alternative strategy of continual pre-training as a means to develop domain-specific LLMs. We introduce FinPythia-6.9B, developed through domain-adaptive continual pre-training on the financial domain. Continual pre-trained FinPythia showcases consistent improvements on financial tasks over the original foundational model. We further explore simple but effective data selection strategies for continual pre-training. Our data selection strategies outperforms vanilla continual pre-training's performance with just 10% of corp...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain...
The remarkable success of large language models has been driven by dense models trained on massive u...
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the pr...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
Large language models (LLMs) show promise for natural language tasks but struggle when applied direc...
Even though many efficient transformers have been proposed, only few such models are available for s...
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in r...
As the demand for sophisticated Natural Language Processing (NLP) models continues to grow, so does ...
Neural network training has been shown to be advantageous in many natural language processing appli...
Recent work on large language models relies on the intuition that most natural language processing t...
Adapting pretrained language models to novel domains, such as clinical applications, traditionally i...
Recent work has demonstrated that pre-training in-domain language models can boost performance when ...
Continual learning (CL) is a setting in which a model learns from a stream of incoming data while av...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain...
The remarkable success of large language models has been driven by dense models trained on massive u...
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the pr...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
Large language models (LLMs) show promise for natural language tasks but struggle when applied direc...
Even though many efficient transformers have been proposed, only few such models are available for s...
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in r...
As the demand for sophisticated Natural Language Processing (NLP) models continues to grow, so does ...
Neural network training has been shown to be advantageous in many natural language processing appli...
Recent work on large language models relies on the intuition that most natural language processing t...
Adapting pretrained language models to novel domains, such as clinical applications, traditionally i...
Recent work has demonstrated that pre-training in-domain language models can boost performance when ...
Continual learning (CL) is a setting in which a model learns from a stream of incoming data while av...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain...
The remarkable success of large language models has been driven by dense models trained on massive u...