Best paper award.International audienceSpark is being successfully used for big data parallel processing in many business domains (social media, finance, retail). Spark's scalability, usability, and large user community have motivated developers from scientific domains (bioinformatics, oil and gas, astronomy) to try it. However, scientific applications' profile, e.g., black-box programs and intense file writes, differs from traditional business workflows, which may affect its scalability. We present a scalability analysis of Spark in a real case-study in Oil and Gas domain. We explore workloads on a 936-cores HPC cluster processing 330 GB of scientific data. We show that it scales very well when running long-lasting scientific tasks, but it...
Project Specification The goal of this openlab summer student project is to analyse Apache Spark as...
\ua9 2014 IEEE. The increasing demands of big data applications have led researchers and practitione...
The recent advances in DNA sequencing technology triggered next-generation sequencing (NGS) research...
Best paper award.International audienceSpark is being successfully used for big data parallel proces...
Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end...
As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly ...
Sheer increase in volume of data over the last decade has triggered research in cluster computing fr...
Processing big data in real-time is challenging due to scalability, information consistency, and fau...
"Sympathy for Data" is a platform that is utilized for Big Data automation analytics. It is based on...
The focus of companies like Google, Amazon etc. is to gain competitive business advantage from the i...
Due to the latest development in the context of Internet of Things, the amount of generated and coll...
In this paper we evaluate and compare two representativeand popular distributed processing engines f...
International audienceBig Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
Spark has become one of the main options for large-scale analytics running on top of shared-nothing ...
Project Specification The goal of this openlab summer student project is to analyse Apache Spark as...
\ua9 2014 IEEE. The increasing demands of big data applications have led researchers and practitione...
The recent advances in DNA sequencing technology triggered next-generation sequencing (NGS) research...
Best paper award.International audienceSpark is being successfully used for big data parallel proces...
Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end...
As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly ...
Sheer increase in volume of data over the last decade has triggered research in cluster computing fr...
Processing big data in real-time is challenging due to scalability, information consistency, and fau...
"Sympathy for Data" is a platform that is utilized for Big Data automation analytics. It is based on...
The focus of companies like Google, Amazon etc. is to gain competitive business advantage from the i...
Due to the latest development in the context of Internet of Things, the amount of generated and coll...
In this paper we evaluate and compare two representativeand popular distributed processing engines f...
International audienceBig Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
Spark has become one of the main options for large-scale analytics running on top of shared-nothing ...
Project Specification The goal of this openlab summer student project is to analyse Apache Spark as...
\ua9 2014 IEEE. The increasing demands of big data applications have led researchers and practitione...
The recent advances in DNA sequencing technology triggered next-generation sequencing (NGS) research...