In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and op...
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, ther...
With the emergence of various big data platforms in recent years, Apache Spark - a distributed large...
Task-based programming has proven to be a suitable model for high-performance computing (HPC) applic...
Abstract—In this paper we present a framework to enable data-intensive Spark workloads on MareNostru...
In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a peta...
International audienceBig Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been...
We report our experiences porting Spark to large production HPC systems. While Spark performance in ...
\ua9 2014 IEEE. The increasing demands of big data applications have led researchers and practitione...
As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly ...
Deployment of a distributed deep learning technology stack on a large parallel system is a very comp...
Spark has been established as an attractive platform for big data analysis, since it manages to hide...
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, ther...
Best paper award.International audienceSpark is being successfully used for big data parallel proces...
The digital era's requirements pose many challenges related to deployment, implementation and effici...
Spark has become one of the main options for large-scale analytics running on top of shared-nothing ...
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, ther...
With the emergence of various big data platforms in recent years, Apache Spark - a distributed large...
Task-based programming has proven to be a suitable model for high-performance computing (HPC) applic...
Abstract—In this paper we present a framework to enable data-intensive Spark workloads on MareNostru...
In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a peta...
International audienceBig Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been...
We report our experiences porting Spark to large production HPC systems. While Spark performance in ...
\ua9 2014 IEEE. The increasing demands of big data applications have led researchers and practitione...
As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly ...
Deployment of a distributed deep learning technology stack on a large parallel system is a very comp...
Spark has been established as an attractive platform for big data analysis, since it manages to hide...
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, ther...
Best paper award.International audienceSpark is being successfully used for big data parallel proces...
The digital era's requirements pose many challenges related to deployment, implementation and effici...
Spark has become one of the main options for large-scale analytics running on top of shared-nothing ...
As the adoption of Big Data technologies becomes the norm in an increasing number of scenarios, ther...
With the emergence of various big data platforms in recent years, Apache Spark - a distributed large...
Task-based programming has proven to be a suitable model for high-performance computing (HPC) applic...