Language Models are being widely used in Education. Even though modern deep learning models achieve very good performance on question-answering tasks, sometimes they make errors. To avoid misleading students by showing wrong answers, it is important to calibrate the confidence - that is, the prediction probability - of these models. In our work, we propose to use an XGBoost on top of BERT to output the corrected probabilities, using features based on the attention mechanism. Our hypothesis is that the level of uncertainty contained in the flow of attention is related to the quality of the model's response itself
We propose a benchmark to measure whether a language model is truthful in generating answers to ques...
A much studied issue is the extent to which the confidence scores provided by machine learning algor...
How and when can we depend on machine learning systems to make decisions for human-being? This is pr...
Calibration strengthens the trustworthiness of black-box models by producing better accurate confide...
We show that a GPT-3 model can learn to express uncertainty about its own answers in natural languag...
Language models (LMs) have demonstrated remarkable capabilities across a wide range of tasks in vari...
With the recent success of deep learning methods, neural-based models have achieved superior perform...
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread ...
With model trustworthiness being crucial for sensitive real-world applications, practitioners are pu...
Calibrating deep neural models plays an important role in building reliable, robust AI systems in sa...
© 2018 ACM People are not infallible consistent “oracles”: their confidence in decision-making may v...
Trustworthy language models should abstain from answering questions when they do not know the answer...
In this thesis, we give an overview of current methodology in the field of uncertainty estimation in...
We study whether language models can evaluate the validity of their own claims and predict which que...
Deep learning models have been developed for a variety of tasks and are deployed every day to work i...
We propose a benchmark to measure whether a language model is truthful in generating answers to ques...
A much studied issue is the extent to which the confidence scores provided by machine learning algor...
How and when can we depend on machine learning systems to make decisions for human-being? This is pr...
Calibration strengthens the trustworthiness of black-box models by producing better accurate confide...
We show that a GPT-3 model can learn to express uncertainty about its own answers in natural languag...
Language models (LMs) have demonstrated remarkable capabilities across a wide range of tasks in vari...
With the recent success of deep learning methods, neural-based models have achieved superior perform...
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread ...
With model trustworthiness being crucial for sensitive real-world applications, practitioners are pu...
Calibrating deep neural models plays an important role in building reliable, robust AI systems in sa...
© 2018 ACM People are not infallible consistent “oracles”: their confidence in decision-making may v...
Trustworthy language models should abstain from answering questions when they do not know the answer...
In this thesis, we give an overview of current methodology in the field of uncertainty estimation in...
We study whether language models can evaluate the validity of their own claims and predict which que...
Deep learning models have been developed for a variety of tasks and are deployed every day to work i...
We propose a benchmark to measure whether a language model is truthful in generating answers to ques...
A much studied issue is the extent to which the confidence scores provided by machine learning algor...
How and when can we depend on machine learning systems to make decisions for human-being? This is pr...