Data processing pipelines that are designed to clean, transform and alter data in preparation for learning predictive models, have an impact on those models’ accuracy and performance, as well on other properties, such as model fairness. It is therefore important to provide developers with the means to gain an in-depth understanding of how the pipeline steps affect the data, from the raw input to training sets ready to be used for learning. While other efforts track creation and changes of pipelines of relational operators, in this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements...
In this paper, we propose a provenance model able to represent the provenance of any data object cap...
Data provenance is information about where data come from (provenance data) and how they transform (...
2012-10-12Provenance, the derivation history of data objects, records how, when, and by whom a piece...
Data processing pipelines that are designed to clean, transform and alter data in preparation for le...
In this work we analyze the typical operations of data preparation within a machine learning process...
International audienceMachine Learning (ML) has become essential in several industries. In Computati...
Data provenance tools seek to facilitate reproducible data science and auditable data analyses by ca...
Data provenance is the history of a digital artifact, from the point of collection to its present<br...
In science, results that are not reproducible by peer scientists are valueless and of no significanc...
Scientists can facilitate data intensive applications to study and understand the behavior of a comp...
Recent years have witnessed increased demand for users to be able to interpret the results of data s...
Data provenance tools seek to facilitate reproducible data science and auditable data analyses by ca...
This dataset is a prototype implementation of a mechanism for linking provenance information and its...
Machine learning (ML) presents new challenges for reproducible software engineering, as the artifact...
Data provenance allows scientists in different domains validating their models and algorithms to fin...
In this paper, we propose a provenance model able to represent the provenance of any data object cap...
Data provenance is information about where data come from (provenance data) and how they transform (...
2012-10-12Provenance, the derivation history of data objects, records how, when, and by whom a piece...
Data processing pipelines that are designed to clean, transform and alter data in preparation for le...
In this work we analyze the typical operations of data preparation within a machine learning process...
International audienceMachine Learning (ML) has become essential in several industries. In Computati...
Data provenance tools seek to facilitate reproducible data science and auditable data analyses by ca...
Data provenance is the history of a digital artifact, from the point of collection to its present<br...
In science, results that are not reproducible by peer scientists are valueless and of no significanc...
Scientists can facilitate data intensive applications to study and understand the behavior of a comp...
Recent years have witnessed increased demand for users to be able to interpret the results of data s...
Data provenance tools seek to facilitate reproducible data science and auditable data analyses by ca...
This dataset is a prototype implementation of a mechanism for linking provenance information and its...
Machine learning (ML) presents new challenges for reproducible software engineering, as the artifact...
Data provenance allows scientists in different domains validating their models and algorithms to fin...
In this paper, we propose a provenance model able to represent the provenance of any data object cap...
Data provenance is information about where data come from (provenance data) and how they transform (...
2012-10-12Provenance, the derivation history of data objects, records how, when, and by whom a piece...