Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such as memory size, CPU allocation, and the number of running nodes. Regular users and even expert administrators struggle to understand the mutual relation between different parameter configurations and the overall performance of the system. In this paper, we address this challenge by proposing a performance prediction framework, called $d$-Simplexed, to build performance models with varied configurable parameters on Spark. We take inspiration from the field of Computational Geometry to construct a d-dimensional mesh using Delaunay Triangulation over a selected set of features. From this mesh, we predict execution time for various feature configu...
Big Data applications allow to successfully analyze large amounts of data not necessarily structured...
Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, ...
Thesis (Ph.D.)--University of Washington, 2018Large-scale data analytics is key to modern science, t...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU alloc...
Cloud data analytics has become an integral part of enterprisebusiness operations for data-driven in...
In-memory cluster computing platforms have gained momentum in the last years, due to their ability t...
Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision f...
A wide spectrum of big data applications in science, engineering, and industry generate large datase...
Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache S...
Companies depend on mining data to grow their business more than ever. To achieve optimal performanc...
Spark is one of the most popular big data analytical platforms. To save time, achieve high resource ...
Cloud-based solutions are increasingly being used to implement large-scale dynamic data driven appli...
Big Data frameworks have received tremendous attention from the industry and from academic research ...
Database and big data analytics systems such as Hadoop and Spark have a large number of configuratio...
Big Data applications allow to successfully analyze large amounts of data not necessarily structured...
Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, ...
Thesis (Ph.D.)--University of Washington, 2018Large-scale data analytics is key to modern science, t...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Big Data frameworks (e.g., Spark) have many configuration parameters, such as memory size, CPU alloc...
Cloud data analytics has become an integral part of enterprisebusiness operations for data-driven in...
In-memory cluster computing platforms have gained momentum in the last years, due to their ability t...
Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision f...
A wide spectrum of big data applications in science, engineering, and industry generate large datase...
Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache S...
Companies depend on mining data to grow their business more than ever. To achieve optimal performanc...
Spark is one of the most popular big data analytical platforms. To save time, achieve high resource ...
Cloud-based solutions are increasingly being used to implement large-scale dynamic data driven appli...
Big Data frameworks have received tremendous attention from the industry and from academic research ...
Database and big data analytics systems such as Hadoop and Spark have a large number of configuratio...
Big Data applications allow to successfully analyze large amounts of data not necessarily structured...
Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, ...
Thesis (Ph.D.)--University of Washington, 2018Large-scale data analytics is key to modern science, t...