Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS) to be able to address complex applications that require diagnostic capabilities and assurance as well as serving as a supporting technology for other tasks such as revision processing. In this paper, based on an example use case, we motivate the need for fine-grained provenance in stream processing and analyze its requirements. Inspired by these requirements, we investigate different techniques to generate and retrieve stream provenance, and propose a new technique that is based on operator in-strumentation. Ariadne, our provenance-aware DSMS implements this technique on top of the Borealis system. We propose new optimization techniques to r...
In this work we analyze the typical operations of data preparation within a machine learning process...
Scientists can facilitate data intensive applications to study and understand the behavior of a comp...
Many applications now involve the collection of large amounts of data from multiple users, and then ...
Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS)...
Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS)...
Applications that require continuous processing of high-volume data streams have grown in prevalence...
Fine-grained data provenance ensures reproducibility of results in decision making, process control ...
In stream data processing, data arrives continuously and is processed by decision making, process co...
Provenance describes how results are produced starting from data sources, curation, recovery, interm...
Applications that operate over streaming data withhigh-volume and real-time processing requirements ...
Fine-grained data provenance in data streaming allows linking each result tuple back to the source d...
Data processing pipelines that are designed to clean, transform and alter data in preparation for le...
Abstract. Often data processing is not implemented by a workflow sys-tem or an integration applicati...
Data provenance tools seek to facilitate reproducible data science and auditable data analyses by ca...
Data management is growing in complexity as large-scale applications take advantage of the loosely c...
In this work we analyze the typical operations of data preparation within a machine learning process...
Scientists can facilitate data intensive applications to study and understand the behavior of a comp...
Many applications now involve the collection of large amounts of data from multiple users, and then ...
Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS)...
Managing fine-grained provenance is a critical requirement for data stream management systems (DSMS)...
Applications that require continuous processing of high-volume data streams have grown in prevalence...
Fine-grained data provenance ensures reproducibility of results in decision making, process control ...
In stream data processing, data arrives continuously and is processed by decision making, process co...
Provenance describes how results are produced starting from data sources, curation, recovery, interm...
Applications that operate over streaming data withhigh-volume and real-time processing requirements ...
Fine-grained data provenance in data streaming allows linking each result tuple back to the source d...
Data processing pipelines that are designed to clean, transform and alter data in preparation for le...
Abstract. Often data processing is not implemented by a workflow sys-tem or an integration applicati...
Data provenance tools seek to facilitate reproducible data science and auditable data analyses by ca...
Data management is growing in complexity as large-scale applications take advantage of the loosely c...
In this work we analyze the typical operations of data preparation within a machine learning process...
Scientists can facilitate data intensive applications to study and understand the behavior of a comp...
Many applications now involve the collection of large amounts of data from multiple users, and then ...