Big data programming frameworks are becoming increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it ...
Recent work showed that compiling functional programs to use dense, serialized memory representation...
Metaheuristics have been showing interesting results in solving hard optimization problems. However...
With the emergence of massive datasets across different application domains, there is a rapidly grow...
The goal of my PhD is to study the optimization and the distribution of queries, especially recursiv...
We present an algebra with a fixpoint operator which is suitable for modeling computations with dist...
Le but de ma thèse est d’étudier l’optimisation et la distribution de requêtes, principalement de re...
Traditional databases are facing problems of scalability and efficiency dealing with a vast amount o...
In the Big Data era, there is a resurgence of interest in using Datalog to express data analysis app...
In the past, the semantic issues raised by the non-monotonic nature of aggregates often prevented th...
International audienceThis paper proposes a model for specifying data flow-based parallel data proc...
Many data analysis programs are often expressed in terms of array operations in sequential loops. Ho...
Writing high performance code has steadily become more challenging since the design of computing sys...
Abstract—The exploding demand for analytics has refocused the attention of data scientists on applic...
A long version is also available as a research report under the same name.International audienceBala...
This is an extended version of Modeling Big Data Processing Programs, by Joao Batista de Souza Neto,...
Recent work showed that compiling functional programs to use dense, serialized memory representation...
Metaheuristics have been showing interesting results in solving hard optimization problems. However...
With the emergence of massive datasets across different application domains, there is a rapidly grow...
The goal of my PhD is to study the optimization and the distribution of queries, especially recursiv...
We present an algebra with a fixpoint operator which is suitable for modeling computations with dist...
Le but de ma thèse est d’étudier l’optimisation et la distribution de requêtes, principalement de re...
Traditional databases are facing problems of scalability and efficiency dealing with a vast amount o...
In the Big Data era, there is a resurgence of interest in using Datalog to express data analysis app...
In the past, the semantic issues raised by the non-monotonic nature of aggregates often prevented th...
International audienceThis paper proposes a model for specifying data flow-based parallel data proc...
Many data analysis programs are often expressed in terms of array operations in sequential loops. Ho...
Writing high performance code has steadily become more challenging since the design of computing sys...
Abstract—The exploding demand for analytics has refocused the attention of data scientists on applic...
A long version is also available as a research report under the same name.International audienceBala...
This is an extended version of Modeling Big Data Processing Programs, by Joao Batista de Souza Neto,...
Recent work showed that compiling functional programs to use dense, serialized memory representation...
Metaheuristics have been showing interesting results in solving hard optimization problems. However...
With the emergence of massive datasets across different application domains, there is a rapidly grow...