The goal of data cleaning is to make data fit for purpose, i.e., to improve data quality, through updates and data transformations, such that downstream analyses can be conducted and lead to trustworthy results. A transparent and reusable data cleaning workflow can save time and effort through automation, and make subsequent data cleaning on new data less errorprone. However, reusability of data cleaning workflows has received little to no attention in the research community. We identify some challenges and opportunities for reusing data cleaning workflows. We present a high-level conceptual model to clarify what we mean by reusability and propose ways to improve reusability along different dimensions. We ...
Learning analytics is the analysis of student data with the purpose of improving learning. However, ...
Data cleaning has become one of the important pre-processing steps for many data science, data analy...
Data sharing is a difficult process for both the data producer and the data reuser. Both parties are...
The goal of data cleaning is to make data fit for purpose, i.e., to improve data quality, through up...
Before data from multiple sources can be analyzed, data cleaning workflows (“recipes”) usually need ...
Data cleaning and preparation are essential parts of data curation lifecycles and scientific workflo...
Cleaning data (i.e., making sure data contains no errors) can take a large part of a project’s lifet...
Data-centric applications have never been more ubiquitous in our lives, e.g., search engines, route ...
Lightning talk presentation for the 17th International Digital Cuation Conference (IDCC22) on the to...
Reviewed by Mário SilvaData cleaning and Extract-Transform-Load processes are usually modeled as gra...
We classify data quality problems that are addressed by data cleaning and provide an overview of the...
High quality data is a vital asset for several businesses and applications. With flawed data costing...
Data reuse refers to the secondary use of data—not for its original purpose but for studying new pr...
We study provenance features of OpenRefine, a popular data cleaning tool. In OpenRefine, provenance ...
Scientific data reuse requires careful curation and annotation of the data. Late stage curation acti...
Learning analytics is the analysis of student data with the purpose of improving learning. However, ...
Data cleaning has become one of the important pre-processing steps for many data science, data analy...
Data sharing is a difficult process for both the data producer and the data reuser. Both parties are...
The goal of data cleaning is to make data fit for purpose, i.e., to improve data quality, through up...
Before data from multiple sources can be analyzed, data cleaning workflows (“recipes”) usually need ...
Data cleaning and preparation are essential parts of data curation lifecycles and scientific workflo...
Cleaning data (i.e., making sure data contains no errors) can take a large part of a project’s lifet...
Data-centric applications have never been more ubiquitous in our lives, e.g., search engines, route ...
Lightning talk presentation for the 17th International Digital Cuation Conference (IDCC22) on the to...
Reviewed by Mário SilvaData cleaning and Extract-Transform-Load processes are usually modeled as gra...
We classify data quality problems that are addressed by data cleaning and provide an overview of the...
High quality data is a vital asset for several businesses and applications. With flawed data costing...
Data reuse refers to the secondary use of data—not for its original purpose but for studying new pr...
We study provenance features of OpenRefine, a popular data cleaning tool. In OpenRefine, provenance ...
Scientific data reuse requires careful curation and annotation of the data. Late stage curation acti...
Learning analytics is the analysis of student data with the purpose of improving learning. However, ...
Data cleaning has become one of the important pre-processing steps for many data science, data analy...
Data sharing is a difficult process for both the data producer and the data reuser. Both parties are...