Transformer-based masked language models trained on general corpora, such as BERT and RoBERTa, have shown impressive performance on various downstream tasks. Increasingly, researchers are "finetuning" these models to improve performance on domain-specific tasks. Here, we report a broad study in which we applied 14 transformer-based models to 11 scientific tasks in order to evaluate how downstream performance is affected by changes along various dimensions (e.g., training data, model size, pretraining time, finetuning length). In this process, we created the largest and most diverse scientific language model to date, ScholarBERT, by training a 770M-parameter BERT model on an 221B token scientific literature dataset spanning many disciplines....
Large Language Models have become the core architecture upon which most modern natural language proc...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Distilling state-of-the-art transformer models into lightweight student models is an effective way t...
Recently, the development of pre-trained language models has brought natural language processing (NL...
This paper describes the models developed by the AILAB-Udine team for the SMM4H 22 Shared Task. We e...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Pre-trained language models have been dominating the field of natural language processing in recent ...
The Bidirectional Encoder Representations from Transformers (BERT) is currently one of the most impo...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Recent trends in language modeling have focused on increasing performance through scaling, and have ...
In the last five years, the rise of the self-attentional Transformer-based architectures led to stat...
Transformer-based language models have become a key building block for natural language processing. ...
Large Language Models have become the core architecture upon which most modern natural language proc...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Distilling state-of-the-art transformer models into lightweight student models is an effective way t...
Recently, the development of pre-trained language models has brought natural language processing (NL...
This paper describes the models developed by the AILAB-Udine team for the SMM4H 22 Shared Task. We e...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Pre-trained language models have been dominating the field of natural language processing in recent ...
The Bidirectional Encoder Representations from Transformers (BERT) is currently one of the most impo...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Recent trends in language modeling have focused on increasing performance through scaling, and have ...
In the last five years, the rise of the self-attentional Transformer-based architectures led to stat...
Transformer-based language models have become a key building block for natural language processing. ...
Large Language Models have become the core architecture upon which most modern natural language proc...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Distilling state-of-the-art transformer models into lightweight student models is an effective way t...