AbstractIn prior work it has been shown that the design of scientific workflows can benefit from a collection-oriented modeling paradigm which views scientific workflows as pipelines of XML stream processors. In this paper, we present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that receive XML-structured data and produce, often through calls to “black-box” (scientific) functions, modified (i.e., updated) XML structures. Our main contributions are (i) the development of a set of strategies for compiling scientific workflows, modeled as XML processing pipelines, into parallel MapReduce...
<p>The computer industry is being challenged to develop methods and techniques for affordable data p...
Abstract. XML has been widely adopted across a wide spectrum of applica-tions. Its parsing efficienc...
MapReduce is a programming model and an associated implementation for processing and generating larg...
In prior work it has been shown that the design of scientific workflows can benefit from a collectio...
AbstractIn prior work it has been shown that the design of scientific workflows can benefit from a c...
Abstract—An emerging trend is the use of XML as the data format for many distributed scientific appl...
XML has become a widely used and well structured data format for digital document handling and messa...
Abstract — XML process networks are a simple, yet power-ful programming paradigm for loosely coupled...
Semi-structured information is often represented in the XML format. Although, a vast amount of appro...
Abstract. Development of applications that process large scientific datasets is of-ten complicated b...
High-performance streaming applications are becoming a new and distinct domain of programs with the ...
Very large scientific datasets are becoming increasingly available in XML formats. Our earlier bench...
Abstract. MapReduce/Hadoop has gained acceptance as a framework to process, transform, integrate, an...
In online social networking, network monitoring and finan-cial applications, there is a need to quer...
Abstract — A language for semi-structured documents, XML has emerged as the core of the web services...
<p>The computer industry is being challenged to develop methods and techniques for affordable data p...
Abstract. XML has been widely adopted across a wide spectrum of applica-tions. Its parsing efficienc...
MapReduce is a programming model and an associated implementation for processing and generating larg...
In prior work it has been shown that the design of scientific workflows can benefit from a collectio...
AbstractIn prior work it has been shown that the design of scientific workflows can benefit from a c...
Abstract—An emerging trend is the use of XML as the data format for many distributed scientific appl...
XML has become a widely used and well structured data format for digital document handling and messa...
Abstract — XML process networks are a simple, yet power-ful programming paradigm for loosely coupled...
Semi-structured information is often represented in the XML format. Although, a vast amount of appro...
Abstract. Development of applications that process large scientific datasets is of-ten complicated b...
High-performance streaming applications are becoming a new and distinct domain of programs with the ...
Very large scientific datasets are becoming increasingly available in XML formats. Our earlier bench...
Abstract. MapReduce/Hadoop has gained acceptance as a framework to process, transform, integrate, an...
In online social networking, network monitoring and finan-cial applications, there is a need to quer...
Abstract — A language for semi-structured documents, XML has emerged as the core of the web services...
<p>The computer industry is being challenged to develop methods and techniques for affordable data p...
Abstract. XML has been widely adopted across a wide spectrum of applica-tions. Its parsing efficienc...
MapReduce is a programming model and an associated implementation for processing and generating larg...