Data scientists often implement machine learning algo-rithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the per-formance and scalability of specialised data-parallel pro-cessing frameworks. Our goal is to execute impera-tive Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java pro-grams without compromising scalability, and how to re-cover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG). By explic-itly separating data from mutable state, SD...
Abstract. Current approaches to the development of reliable software systems include decomposition i...
Implementing machine learning algorithms for large data, such as the Web graph and social networks, ...
In this thesis, we address the problem of efficiently and automatically scaling iterative computatio...
Data scientists often implement machine learning algo- rithms in imperative languages such as Java, ...
Big data processing is no longer restricted to specially-trained engineers. Instead, domain experts,...
MapReduce and similar systems significantly ease the task of writ-ing data-parallel code. However, m...
International audienceThis paper presents the parallelization of a machine learning method, called t...
Multi-core processors require a program to be decomposable into independent parts that can execute i...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
The ability to do rich analytics on massive sets of unstructured data drives the operation of many o...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evol...
International audienceDataflow Models of Computation (MoCs) have proven efficient means for modeling...
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative ...
This paper presents the Gaspar data-centric framework to develop high performance parallel applicati...
Abstract. Current approaches to the development of reliable software systems include decomposition i...
Implementing machine learning algorithms for large data, such as the Web graph and social networks, ...
In this thesis, we address the problem of efficiently and automatically scaling iterative computatio...
Data scientists often implement machine learning algo- rithms in imperative languages such as Java, ...
Big data processing is no longer restricted to specially-trained engineers. Instead, domain experts,...
MapReduce and similar systems significantly ease the task of writ-ing data-parallel code. However, m...
International audienceThis paper presents the parallelization of a machine learning method, called t...
Multi-core processors require a program to be decomposable into independent parts that can execute i...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
The ability to do rich analytics on massive sets of unstructured data drives the operation of many o...
MapReduce has been widely accepted as a simple programming pattern that can form the basis for effic...
This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evol...
International audienceDataflow Models of Computation (MoCs) have proven efficient means for modeling...
Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative ...
This paper presents the Gaspar data-centric framework to develop high performance parallel applicati...
Abstract. Current approaches to the development of reliable software systems include decomposition i...
Implementing machine learning algorithms for large data, such as the Web graph and social networks, ...
In this thesis, we address the problem of efficiently and automatically scaling iterative computatio...