In this position statement, we wish to contribute to the discussion about how to assess quality and coverage of a model. We believe that BERT's prominence as a single-step pipeline for contextualization and classification highlights the need for benchmarks to evolve concurrently with models. Much recent work has touted BERT's raw power for solving natural language tasks, so we used a 12-layer uncased BERT pipeline with a linear classifier as a quick-and-dirty model to score well on the SemEval 2010 Task 8 dataset for relation classification between nominals. We initially expected there to be significant enough bias from BERT's training to influence downstream tasks, since it is well-known that biased training corpora can lead to biased lang...
As language models have grown in parameters and layers, it has become much harder to train and infer...
Analogies play a central role in human commonsense reasoning. The ability to recognize analogies suc...
Large language models produce human-like text that drive a growing number of applications. However, ...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...
BERTScore (Zhang et al., 2020), a recently proposed automatic metric for machine translation quality...
Pretrained Language Models (PLMs), though popular, have been diagnosed to encode bias against protec...
Pretrained language models are publicly available and constantly finetuned for various real-life app...
Reproducibility is of utmost concern in machine learning and natural language processing (NLP). In t...
In recent years, large Transformer-based Pre-trained Language Models (PLM) have changed the Natural ...
Evaluating generated text received new attention with the introduction of model-based metrics in rec...
The outstanding performance recently reached by Neural Language Models (NLMs) across many Natural La...
Transformer-based masked language models trained on general corpora, such as BERT and RoBERTa, have ...
Contextualized word embeddings have been replacing standard embeddings as the representational knowl...
The evaluation of recent embedding-based evaluation metrics for text generation is primarily based o...
As language models have grown in parameters and layers, it has become much harder to train and infer...
Analogies play a central role in human commonsense reasoning. The ability to recognize analogies suc...
Large language models produce human-like text that drive a growing number of applications. However, ...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...
BERTScore (Zhang et al., 2020), a recently proposed automatic metric for machine translation quality...
Pretrained Language Models (PLMs), though popular, have been diagnosed to encode bias against protec...
Pretrained language models are publicly available and constantly finetuned for various real-life app...
Reproducibility is of utmost concern in machine learning and natural language processing (NLP). In t...
In recent years, large Transformer-based Pre-trained Language Models (PLM) have changed the Natural ...
Evaluating generated text received new attention with the introduction of model-based metrics in rec...
The outstanding performance recently reached by Neural Language Models (NLMs) across many Natural La...
Transformer-based masked language models trained on general corpora, such as BERT and RoBERTa, have ...
Contextualized word embeddings have been replacing standard embeddings as the representational knowl...
The evaluation of recent embedding-based evaluation metrics for text generation is primarily based o...
As language models have grown in parameters and layers, it has become much harder to train and infer...
Analogies play a central role in human commonsense reasoning. The ability to recognize analogies suc...
Large language models produce human-like text that drive a growing number of applications. However, ...