The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques in-cluding buffering, compression, and serialization, CBTs improve the memory efficiency and performance of the GroupBy-Aggregate abstraction that forms the basis of not only batch-processing models like MapReduce, but re-cent fast analytics systems too. For streaming workloads, aggregation using the CBT uses 21-42 % less memory than using Google SparseHash with up to 16 % better throughput. The CBT is also compared to batch-mode aggregators in MapReduce runtimes such as Phoeni...
Abstract: The growing computational and storage needs of several scientific applications mandate the...
Abstract. Recent compressed suffix trees targeted to highly repetitive text collections reach excell...
High performance stream aggregation is critical for many emerging applications that analyze massive ...
Memory is rapidly becoming a precious resource in many data processing environments. This paper int...
Computing in the last decade has been characterized by the rise of data- intensive scalable computin...
Data Compression is today essential for a wide range of applications: for example Internet and the W...
Analytics is moving to the cloud and data is moving into data lakes. These reside on object storage ...
Distributed systems are now commonly used to manage massive data flooding from the physical world, s...
We present JetStream, a system that allows real-time analysis of large, widely-distributed changing ...
Columnar databases have dominated the data analysis market for their superior performance in query p...
In many data gathering applications, information arrives in the form of continuous streams rather th...
MapReduce is well-applied in high performance computing for large scale data processing. However, as...
Gradient boosting tree (GBT), a widely used machine learning algorithm, achieves state-of-the-art pe...
Data mining can be viewed as a result of the natural evolution of information technology. The spread...
Many applications dealing with large data structures can benefit from keeping them in compressed fo...
Abstract: The growing computational and storage needs of several scientific applications mandate the...
Abstract. Recent compressed suffix trees targeted to highly repetitive text collections reach excell...
High performance stream aggregation is critical for many emerging applications that analyze massive ...
Memory is rapidly becoming a precious resource in many data processing environments. This paper int...
Computing in the last decade has been characterized by the rise of data- intensive scalable computin...
Data Compression is today essential for a wide range of applications: for example Internet and the W...
Analytics is moving to the cloud and data is moving into data lakes. These reside on object storage ...
Distributed systems are now commonly used to manage massive data flooding from the physical world, s...
We present JetStream, a system that allows real-time analysis of large, widely-distributed changing ...
Columnar databases have dominated the data analysis market for their superior performance in query p...
In many data gathering applications, information arrives in the form of continuous streams rather th...
MapReduce is well-applied in high performance computing for large scale data processing. However, as...
Gradient boosting tree (GBT), a widely used machine learning algorithm, achieves state-of-the-art pe...
Data mining can be viewed as a result of the natural evolution of information technology. The spread...
Many applications dealing with large data structures can benefit from keeping them in compressed fo...
Abstract: The growing computational and storage needs of several scientific applications mandate the...
Abstract. Recent compressed suffix trees targeted to highly repetitive text collections reach excell...
High performance stream aggregation is critical for many emerging applications that analyze massive ...