MapReduce has been widely adopted by many business and scientific applications for data-intensive processing of large datasets. There are increasing efforts for workflows and sys-tems to work with the MapReduce programming model and the Hadoop environment including our work on a higher-level programming model for MapReduce within the Kepler Scientific Workflow System. However, to date, provenance of MapReduce-based workflows and its effects on workflow execution performance have not been studied in depth. In this paper, we present an extension to our earlier work on MapReduce in Kepler to record the provenance of MapRe-duce workflows created using the Kepler+Hadoop frame-work. In particular, we present: (i) a data model that is able to capt...
Provenance registration is becoming more and more important, as we increase the size and number of e...
Scientific collaboration increasingly involves data sharing between separate groups. We consider a s...
The provenance of a data product contains information about how the product was derived, and is cruc...
We consider a class of workflows, which we call generalized map and reduce workflows (GMRWs), where ...
Scientific experiments are becoming increasingly large and complex, with a commensurate increase in ...
© 2017 Elsevier B.V. The emergence of Cloud computing provides a new computing paradigm for scientif...
RAMP (Reduce And Map Provenance) is an extension to Hadoop that supports provenance capture and trac...
Huge amounts of data are being generated by IoT devices, and are termed as \u2018Big Data\u2019. Big...
Abstract. While a number of scientific workflow systems support data prove-nance, they primarily foc...
Integrated provenance support promises to be a chief advantage of scientific workflow systems over s...
Large-scale, dynamic and open environments such as the Grid and Web Services build upon existing com...
Many scientists are using workflows to systematically design and run computational experiments. Once...
The Swift parallel scripting language allows for the specification, execution and analysis of large-...
Abstract—Scientific collaboration increasingly involves data sharing between separate groups. We con...
Scientists can facilitate data intensive applications to study and understand the behavior of a comp...
Provenance registration is becoming more and more important, as we increase the size and number of e...
Scientific collaboration increasingly involves data sharing between separate groups. We consider a s...
The provenance of a data product contains information about how the product was derived, and is cruc...
We consider a class of workflows, which we call generalized map and reduce workflows (GMRWs), where ...
Scientific experiments are becoming increasingly large and complex, with a commensurate increase in ...
© 2017 Elsevier B.V. The emergence of Cloud computing provides a new computing paradigm for scientif...
RAMP (Reduce And Map Provenance) is an extension to Hadoop that supports provenance capture and trac...
Huge amounts of data are being generated by IoT devices, and are termed as \u2018Big Data\u2019. Big...
Abstract. While a number of scientific workflow systems support data prove-nance, they primarily foc...
Integrated provenance support promises to be a chief advantage of scientific workflow systems over s...
Large-scale, dynamic and open environments such as the Grid and Web Services build upon existing com...
Many scientists are using workflows to systematically design and run computational experiments. Once...
The Swift parallel scripting language allows for the specification, execution and analysis of large-...
Abstract—Scientific collaboration increasingly involves data sharing between separate groups. We con...
Scientists can facilitate data intensive applications to study and understand the behavior of a comp...
Provenance registration is becoming more and more important, as we increase the size and number of e...
Scientific collaboration increasingly involves data sharing between separate groups. We consider a s...
The provenance of a data product contains information about how the product was derived, and is cruc...