peer reviewedAs data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of data. E.g., a health-monitoring system that is designed under the assumption that weight is reported in lbs will malfunction when encountering weight reported in kilograms. Like software debugging, which aims to find bugs in the source code or runtime conditions, our goal is to debug data to identify potential sources of disconnect between the assumptions about some data and systems that operate on that data. We propose DataPrism, a framework to identify data properties (profiles) that are the root causes of performance degradation or failure of a data-driven system. Such i...
Software fault prediction (SFP) has become a pivotal aspect in realm of software quality. Neverthele...
High quality data is a vital asset for several businesses and applications. With flawed data costing...
The usual approach to dealing with imperfections in data is to attempt to eliminate them. However, t...
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, inc...
“Why does my program crash?”—This ever recurring ques-tion of software debugging drives the develope...
Context: Defect prediction research is based on a small number of defect datasets and most are at cl...
Background: The NASA Metrics Data Program data sets have been heavily used in software defect predic...
The ubiquitous nature of software demands that software is released without faults. However, softwar...
This dissertation evaluates the following thesis statement: Program analysis techniques can enable a...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
peer reviewedMachine learning tasks entail the use of complex computational pipelines to reach quant...
Described herein are techniques for a Machine Learning (ML) model to learn from a training set and p...
As software evolves, becoming a more integral part of complex systems, modern society becomes more r...
This is a replication package for the "Taxonomy of Real Faults in Deep Learning Systems" paper. The...
Cyber-physical systems, where computing and communication are used to fortify and streamline the ope...
Software fault prediction (SFP) has become a pivotal aspect in realm of software quality. Neverthele...
High quality data is a vital asset for several businesses and applications. With flawed data costing...
The usual approach to dealing with imperfections in data is to attempt to eliminate them. However, t...
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, inc...
“Why does my program crash?”—This ever recurring ques-tion of software debugging drives the develope...
Context: Defect prediction research is based on a small number of defect datasets and most are at cl...
Background: The NASA Metrics Data Program data sets have been heavily used in software defect predic...
The ubiquitous nature of software demands that software is released without faults. However, softwar...
This dissertation evaluates the following thesis statement: Program analysis techniques can enable a...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
peer reviewedMachine learning tasks entail the use of complex computational pipelines to reach quant...
Described herein are techniques for a Machine Learning (ML) model to learn from a training set and p...
As software evolves, becoming a more integral part of complex systems, modern society becomes more r...
This is a replication package for the "Taxonomy of Real Faults in Deep Learning Systems" paper. The...
Cyber-physical systems, where computing and communication are used to fortify and streamline the ope...
Software fault prediction (SFP) has become a pivotal aspect in realm of software quality. Neverthele...
High quality data is a vital asset for several businesses and applications. With flawed data costing...
The usual approach to dealing with imperfections in data is to attempt to eliminate them. However, t...