Real-world databases often contain syntactic and semantic errors, in spite of integrity constraints and other safety measures available in modern DBMSs. We present an iterative statistical framework for inferring missing information and correcting such errors automatically. The key insight of our approach is to exploit dependencies not only within tuples, but also between attributes of related tuples. We draw on techniques from statistical relational learning to develop an efficient approximate inference algorithm that can be implemented in standard DBMSs using SQL and user-defined functions. The resulting framework performs the inference and data cleaning tasks in an integrated manner, using novel techniques to infer correct values accurat...
Building a more accurate reality model requires taking into account imperfect information present in...
Statistical relational models combine aspects of first-order logic, databases and probabilistic grap...
Probabilistic graphical model representations of relational data provide a number of desired feature...
Thesis (Ph.D.)--University of Washington, 2015One of the central challenges of statistical relationa...
Many data sets routinely captured by organizations are relational in nature---from marketing and sal...
Digitally collected data su\ud ↵\ud ers from many data quality issues, such as duplicate, incorrect,...
Public genomic and proteomic databases can be affected by a variety of errors. These errors may invo...
Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoni...
Statistical estimation and approximate query processing have become increasingly prevalent applicati...
One fundamental limitation of classical statistical modeling is the assumption that data is represen...
A majority of scientific and commercial data is stored in relational databases. Probabilistic models...
Abstract—In declarative data cleaning, data semantics are encoded as constraints and errors arise wh...
One of the main challenges that data cleaning systems face is to automatically identify and repair d...
Data Cleaning is a long standing problem, which is grow-ing in importance with the mass of uncurated...
Many data sets routinely captured by organizations are relational in nature— from marketing and sale...
Building a more accurate reality model requires taking into account imperfect information present in...
Statistical relational models combine aspects of first-order logic, databases and probabilistic grap...
Probabilistic graphical model representations of relational data provide a number of desired feature...
Thesis (Ph.D.)--University of Washington, 2015One of the central challenges of statistical relationa...
Many data sets routinely captured by organizations are relational in nature---from marketing and sal...
Digitally collected data su\ud ↵\ud ers from many data quality issues, such as duplicate, incorrect,...
Public genomic and proteomic databases can be affected by a variety of errors. These errors may invo...
Most theoretical frameworks that focus on data errors and inconsistencies follow logic-based reasoni...
Statistical estimation and approximate query processing have become increasingly prevalent applicati...
One fundamental limitation of classical statistical modeling is the assumption that data is represen...
A majority of scientific and commercial data is stored in relational databases. Probabilistic models...
Abstract—In declarative data cleaning, data semantics are encoded as constraints and errors arise wh...
One of the main challenges that data cleaning systems face is to automatically identify and repair d...
Data Cleaning is a long standing problem, which is grow-ing in importance with the mass of uncurated...
Many data sets routinely captured by organizations are relational in nature— from marketing and sale...
Building a more accurate reality model requires taking into account imperfect information present in...
Statistical relational models combine aspects of first-order logic, databases and probabilistic grap...
Probabilistic graphical model representations of relational data provide a number of desired feature...