Many data analysis programs are often expressed in terms of array operations in sequential loops. However, these programs do not scale very well to large amounts of data that cannot fit in the memory of a single computer and they have to be rewritten to work on Big Data analysis platforms, such as Map-Reduce and Spark. We present a novel framework, called SQLgen, that automatically translates sequential loops on arrays to distributed data-parallel programs, specifically Spark SQL programs. We further extend this framework by introducing OSQLgen, which automatically parallelizes array-based loop programs to distributed data-parallel programs on block arrays. At first, our framework translates the sequential loops on arrays to monoid comprehe...
Linear algebra operations are at the core of many Machine Learning (ML) programs. At the same time, ...
Modern data analysis is undergoing a ``Big Data'' transformation: organizations are generating and g...
Optimizing linear algebra operations has been a research topic for decades. The compact languag...
This dissertation advances the state of the art for scalable high-performance graph analytics and da...
While computational modelling gets more complex and more accurate, its calculation costs have been i...
Machine Learning is a research field with substantial relevance for many applications in different a...
This paper provides an in-depth survey on the integration of machine learning and array databases. F...
A semijoin is a relational operator which reduces a relation by selecting a set of tuples that match...
Thesis (S.M.)--Massachusetts Institute of Technology, Computation for Design and Optimization Progra...
Big data programming frameworks are becoming increasingly important for the development of applicati...
Many emerging programming environments for large-scale data analysis, such as Map-Reduce, Spark, and...
This work demonstrates a wide range of applications that use lambda expressions in SQL. Such injecte...
Data summarization is an essential mechanism to accelerate analytic algorithms on large data sets. I...
MLlib is Spark’s library of machine learning functions developed to operate in parallel on clusters....
The suffix array is the key to efficient solutions for myriads of string processing problems in diff...
Linear algebra operations are at the core of many Machine Learning (ML) programs. At the same time, ...
Modern data analysis is undergoing a ``Big Data'' transformation: organizations are generating and g...
Optimizing linear algebra operations has been a research topic for decades. The compact languag...
This dissertation advances the state of the art for scalable high-performance graph analytics and da...
While computational modelling gets more complex and more accurate, its calculation costs have been i...
Machine Learning is a research field with substantial relevance for many applications in different a...
This paper provides an in-depth survey on the integration of machine learning and array databases. F...
A semijoin is a relational operator which reduces a relation by selecting a set of tuples that match...
Thesis (S.M.)--Massachusetts Institute of Technology, Computation for Design and Optimization Progra...
Big data programming frameworks are becoming increasingly important for the development of applicati...
Many emerging programming environments for large-scale data analysis, such as Map-Reduce, Spark, and...
This work demonstrates a wide range of applications that use lambda expressions in SQL. Such injecte...
Data summarization is an essential mechanism to accelerate analytic algorithms on large data sets. I...
MLlib is Spark’s library of machine learning functions developed to operate in parallel on clusters....
The suffix array is the key to efficient solutions for myriads of string processing problems in diff...
Linear algebra operations are at the core of many Machine Learning (ML) programs. At the same time, ...
Modern data analysis is undergoing a ``Big Data'' transformation: organizations are generating and g...
Optimizing linear algebra operations has been a research topic for decades. The compact languag...