Spark is one of the most popular big data analytical platforms. To save time, achieve high resource utilization, and remain cost-effective for Spark jobs, it is challenging but imperative for data scientists to configure suitable resource portions.In this paper, we investigate the proper parameter values that meet workloads’ performance requirements with minimized resource cost and resource utilization time. We propose SimCost, a simulation-based cost model, to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. Our method’s salient fea...
Traditional resource management techniques that rely on simple heuristics often fail to achieve pred...
Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache S...
Best paper award.International audienceSpark is being successfully used for big data parallel proces...
Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision f...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Spark has gained growing attention in the past couple of years as an in-memory cloud computing platf...
Companies depend on mining data to grow their business more than ever. To achieve optimal performanc...
Apache Spark jobs are often characterized by processing huge data sets and, therefore, require runti...
Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU alloc...
The effective selection of resources on supercomputers and grids improves workload schedul- ing and ...
Spark is an in-memory framework for implementing distributed applications of various types. Predicti...
International audienceIn the cloud computing model, cloud providers invoice clients for resource con...
Nowadays deployment of data-intensive systems in multi-dimensional domains is achieved with insuffic...
After a decade of diffusion, cloud computing has received wide acceptance, but it is not yet attract...
Job schedulers in high energy physics require accurate information about predicted resource consumpt...
Traditional resource management techniques that rely on simple heuristics often fail to achieve pred...
Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache S...
Best paper award.International audienceSpark is being successfully used for big data parallel proces...
Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision f...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Spark has gained growing attention in the past couple of years as an in-memory cloud computing platf...
Companies depend on mining data to grow their business more than ever. To achieve optimal performanc...
Apache Spark jobs are often characterized by processing huge data sets and, therefore, require runti...
Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU alloc...
The effective selection of resources on supercomputers and grids improves workload schedul- ing and ...
Spark is an in-memory framework for implementing distributed applications of various types. Predicti...
International audienceIn the cloud computing model, cloud providers invoice clients for resource con...
Nowadays deployment of data-intensive systems in multi-dimensional domains is achieved with insuffic...
After a decade of diffusion, cloud computing has received wide acceptance, but it is not yet attract...
Job schedulers in high energy physics require accurate information about predicted resource consumpt...
Traditional resource management techniques that rely on simple heuristics often fail to achieve pred...
Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache S...
Best paper award.International audienceSpark is being successfully used for big data parallel proces...