Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision for Spark jobs is challenging but essential for organizations to save time, achieve high resource utilization, and remain cost-effective. In this paper, we study the challenge of determining the proper parameter values that meet the performance requirements of workloads while minimizing both resource cost and resource utilization time. We propose a simulation-based cost model to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo (MC) simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. The ...
Intelligent Virtual Machine (VM) provisioning is central to cost and resource efficient computation ...
International audienceIn the cloud computing model, cloud providers invoice clients for resource con...
The increase in the volume and variety of data has increased the reliance of data scientists on shar...
Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision f...
Spark is one of the most popular big data analytical platforms. To save time, achieve high resource ...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU alloc...
Traditional resource management techniques that rely on simple heuristics often fail to achieve pred...
Big Data frameworks have received tremendous attention from the industry and from academic research ...
Cloud data analytics has become an integral part of enterprisebusiness operations for data-driven in...
Spark has gained growing attention in the past couple of years as an in-memory cloud computing platf...
Apache Spark jobs are often characterized by processing huge data sets and, therefore, require runti...
In-memory cluster computing platforms have gained momentum in the last years, due to their ability t...
Companies depend on mining data to grow their business more than ever. To achieve optimal performanc...
Design space exploration refers to the evaluation of implementation alternatives for many engineerin...
Intelligent Virtual Machine (VM) provisioning is central to cost and resource efficient computation ...
International audienceIn the cloud computing model, cloud providers invoice clients for resource con...
The increase in the volume and variety of data has increased the reliance of data scientists on shar...
Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision f...
Spark is one of the most popular big data analytical platforms. To save time, achieve high resource ...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU alloc...
Traditional resource management techniques that rely on simple heuristics often fail to achieve pred...
Big Data frameworks have received tremendous attention from the industry and from academic research ...
Cloud data analytics has become an integral part of enterprisebusiness operations for data-driven in...
Spark has gained growing attention in the past couple of years as an in-memory cloud computing platf...
Apache Spark jobs are often characterized by processing huge data sets and, therefore, require runti...
In-memory cluster computing platforms have gained momentum in the last years, due to their ability t...
Companies depend on mining data to grow their business more than ever. To achieve optimal performanc...
Design space exploration refers to the evaluation of implementation alternatives for many engineerin...
Intelligent Virtual Machine (VM) provisioning is central to cost and resource efficient computation ...
International audienceIn the cloud computing model, cloud providers invoice clients for resource con...
The increase in the volume and variety of data has increased the reliance of data scientists on shar...