One of the key challenges for data analytics deployment is configuration tuning. The existing approaches for configuration tuning are expensive and overlook the dynamic characteristics of the analytics environment (i.e. frequent changes in workload due to receiving evolving input sizes or change in the underlying cluster environment). Such workload/environment changes can cause significant performance degradation, with retuning the configuration to accommodate those changes can yield up to 85\% potential execution time saving. We propose SimTune, an approach that accommodates such changes through efficient configuration tuning. SimTune combines workload characterization and Multitask Bayesian optimization to identify similarity across ...
RocksDB is a general-purpose embedded key-value store used in multiple different settings. Its versa...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Optimal configuration is vital for a DataBase Management System (DBMS) to achieve high performance. ...
This experimental study presents several overlooked issues that pose a challenge for data analytics ...
<p>Modern industrial, government, and academic organizations are collecting massive amounts of data ...
Cloud-based solutions are increasingly being used to implement large-scale dynamic data driven appli...
Database and big data analytics systems such as Hadoop and Spark have a large number of configuratio...
In the field of machine learning applied to big data, in this thesis work has been implemented an in...
Thesis (Ph.D.)--University of Washington, 2018Large-scale data analytics is key to modern science, t...
The distributed data analytic system - Spark is a common choice for processing massive volumes of he...
© 2018 Elsevier B.V. Nowadays the world has entered the big data era. Big data processing platforms,...
Model calibration is a major challenge faced by the plethora of statistical analytics packages that ...
International audienceData analytics in the cloud has become an integral part of enterprise business...
Selecting appropriate computational resources for data processing jobs on large clusters is difficul...
Performance of big data systems largely relies on efficient data reconfiguration techniques. Data re...
RocksDB is a general-purpose embedded key-value store used in multiple different settings. Its versa...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Optimal configuration is vital for a DataBase Management System (DBMS) to achieve high performance. ...
This experimental study presents several overlooked issues that pose a challenge for data analytics ...
<p>Modern industrial, government, and academic organizations are collecting massive amounts of data ...
Cloud-based solutions are increasingly being used to implement large-scale dynamic data driven appli...
Database and big data analytics systems such as Hadoop and Spark have a large number of configuratio...
In the field of machine learning applied to big data, in this thesis work has been implemented an in...
Thesis (Ph.D.)--University of Washington, 2018Large-scale data analytics is key to modern science, t...
The distributed data analytic system - Spark is a common choice for processing massive volumes of he...
© 2018 Elsevier B.V. Nowadays the world has entered the big data era. Big data processing platforms,...
Model calibration is a major challenge faced by the plethora of statistical analytics packages that ...
International audienceData analytics in the cloud has become an integral part of enterprise business...
Selecting appropriate computational resources for data processing jobs on large clusters is difficul...
Performance of big data systems largely relies on efficient data reconfiguration techniques. Data re...
RocksDB is a general-purpose embedded key-value store used in multiple different settings. Its versa...
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such a...
Optimal configuration is vital for a DataBase Management System (DBMS) to achieve high performance. ...