Data scientists often implement machine learning algo- rithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the per- formance and scalability of specialised data-parallel pro- cessing frameworks. Our goal is to execute impera- tive Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java pro- grams without compromising scalability, and how to re- cover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG). By explic- itly separating data from mutable st...
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative ...
Abstract. Current approaches to the development of reliable software systems include decomposition i...
We discuss the role of Java and Web technologies for general simulation. We classify the classes of ...
Data scientists often implement machine learning algo-rithms in imperative languages such as Java, M...
Big data processing is no longer restricted to specially-trained engineers. Instead, domain experts,...
MapReduce and similar systems significantly ease the task of writ-ing data-parallel code. However, m...
International audienceThis paper presents the parallelization of a machine learning method, called t...
Multi-core processors require a program to be decomposable into independent parts that can execute i...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
This paper presents the Gaspar data-centric framework to develop high performance parallel applicati...
The ability to do rich analytics on massive sets of unstructured data drives the operation of many o...
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evol...
International audienceDataflow Models of Computation (MoCs) have proven efficient means for modeling...
In this thesis, we address the problem of efficiently and automatically scaling iterative computatio...
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative ...
Abstract. Current approaches to the development of reliable software systems include decomposition i...
We discuss the role of Java and Web technologies for general simulation. We classify the classes of ...
Data scientists often implement machine learning algo-rithms in imperative languages such as Java, M...
Big data processing is no longer restricted to specially-trained engineers. Instead, domain experts,...
MapReduce and similar systems significantly ease the task of writ-ing data-parallel code. However, m...
International audienceThis paper presents the parallelization of a machine learning method, called t...
Multi-core processors require a program to be decomposable into independent parts that can execute i...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
This paper presents the Gaspar data-centric framework to develop high performance parallel applicati...
The ability to do rich analytics on massive sets of unstructured data drives the operation of many o...
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evol...
International audienceDataflow Models of Computation (MoCs) have proven efficient means for modeling...
In this thesis, we address the problem of efficiently and automatically scaling iterative computatio...
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative ...
Abstract. Current approaches to the development of reliable software systems include decomposition i...
We discuss the role of Java and Web technologies for general simulation. We classify the classes of ...