As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not representative of the training data. In this work, we take a step towards finding influential training points that also represent the training data well. We first review methods for assigning importance scores to training points. Given importance scores, we propose a method to select a set of DIVerse INfluEntial (DIVINE) training points as a useful expl...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
In addition to reproducing discriminatory relationships in the training data, machine learning (ML) ...
This paper discusses a crowdsourcing based method that we designed to quantify the importance of dif...
Underrepresentation and misrepresentation of protected groups in the training data is a significant ...
Machine learning may be oblivious to human bias but it is not immune to its perpetuation. Marginalis...
Bias in training datasets must be managed for various groups in classification tasks to ensure parit...
Training machine learning (ML) models for natural language processing usually requires lots of data ...
The ability to identify influential training examples enables us to debug training data and explain ...
Good models require good training data. For overparameterized deep models, the causal relationship b...
© 2019 Copyright held by the owner/author(s). This paper introduces fact-checking into Machine Learn...
Machine Learning has become more and more prominent in our daily lives as the Information Age and Fo...
Supervised machine learning is a growing assistive framework for professional decision-making. Yet b...
To encourage ethical thinking in Machine Learning (ML) development, fairness researchers have create...
It is increasingly easy for interested parties to play a role in the development of predictive algor...
With the fast development of algorithmic governance, fairness has become a compulsory property for m...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
In addition to reproducing discriminatory relationships in the training data, machine learning (ML) ...
This paper discusses a crowdsourcing based method that we designed to quantify the importance of dif...
Underrepresentation and misrepresentation of protected groups in the training data is a significant ...
Machine learning may be oblivious to human bias but it is not immune to its perpetuation. Marginalis...
Bias in training datasets must be managed for various groups in classification tasks to ensure parit...
Training machine learning (ML) models for natural language processing usually requires lots of data ...
The ability to identify influential training examples enables us to debug training data and explain ...
Good models require good training data. For overparameterized deep models, the causal relationship b...
© 2019 Copyright held by the owner/author(s). This paper introduces fact-checking into Machine Learn...
Machine Learning has become more and more prominent in our daily lives as the Information Age and Fo...
Supervised machine learning is a growing assistive framework for professional decision-making. Yet b...
To encourage ethical thinking in Machine Learning (ML) development, fairness researchers have create...
It is increasingly easy for interested parties to play a role in the development of predictive algor...
With the fast development of algorithmic governance, fairness has become a compulsory property for m...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
In addition to reproducing discriminatory relationships in the training data, machine learning (ML) ...
This paper discusses a crowdsourcing based method that we designed to quantify the importance of dif...