In this paper we discuss Falcon, an interactive, deterministic, and declarative data cleaning system. Unlike traditional rule-based system, Falcon does not rely on the existence of a set of pre-defined data quality rules, but it encourages users to explore the data, identify possible problems, and make updates to fix them. The main technical challenge consists in finding a set of rules, expressed as sql update queries, that are semantically correct and that fixes the largest number of errors in the data. Falcon navigates the lattice by interacting with users to gradually checking the correctness of a set of rules. We have conducted extensive experiments using both real-world and synthetic datasets to show that Falcon can effectively communi...
Reviewed by Mário SilvaData cleaning and Extract-Transform-Load processes are usually modeled as gra...
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It...
In this paper, a dynamic setting for data quality improvement is studied. In such a setting, there i...
In this paper we discuss Falcon, an interactive, deterministic, and declarative data cleaning system...
We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses S...
High quality data is a vital asset for several businesses and applications. With flawed data costing...
In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the ...
Despite the increasing importance of data quality and the rich theoretical and practical contributio...
One of the main challenges that data cleaning systems face is to automatically identify and repair d...
Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Exi...
Abstract—In declarative data cleaning, data semantics are encoded as constraints and errors arise wh...
We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguis...
Central to a data cleaning system are record matching and data repairing. Matching aims to identify ...
We present SEMANDAQ, a prototype system for improving the quality of relational data. Based on the r...
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It...
Reviewed by Mário SilvaData cleaning and Extract-Transform-Load processes are usually modeled as gra...
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It...
In this paper, a dynamic setting for data quality improvement is studied. In such a setting, there i...
In this paper we discuss Falcon, an interactive, deterministic, and declarative data cleaning system...
We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses S...
High quality data is a vital asset for several businesses and applications. With flawed data costing...
In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the ...
Despite the increasing importance of data quality and the rich theoretical and practical contributio...
One of the main challenges that data cleaning systems face is to automatically identify and repair d...
Improving data quality is a time-consuming, labor-intensive and often domain specific operation. Exi...
Abstract—In declarative data cleaning, data semantics are encoded as constraints and errors arise wh...
We present NADEEF, an extensible, generic and easy-to-deploy data cleaning system. NADEEF distinguis...
Central to a data cleaning system are record matching and data repairing. Matching aims to identify ...
We present SEMANDAQ, a prototype system for improving the quality of relational data. Based on the r...
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It...
Reviewed by Mário SilvaData cleaning and Extract-Transform-Load processes are usually modeled as gra...
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It...
In this paper, a dynamic setting for data quality improvement is studied. In such a setting, there i...