Hashing is one of the fundamental techniques used to implement query processing operators such as grouping, aggregation and join. This paper studies the interaction between modern computer architecture and hash-based query processing techniques. First, we focus on extracting maximum hashing performance from super-scalar CPUs. In particular, we discuss fast hash functions, ways to efficiently handle multi-column keys and propose the use of a recently introduced hashing scheme called Cuckoo Hashing over the commonly used bucket-chained hashing. In the second part of the paper, we focus on the CPU cache usage, by dynamically partitioning data streams such that the partial hash tables fit in the CPU cache. Conventional partitioning works as a s...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
Abstract—High-performance analytical data processing sys-tems often run on servers with large amount...
Fast concurrent hash tables are an increasingly important building block as we scale systems to grea...
Hashing is one of the fundamental techniques used to implement query processing operators such as gr...
In the past decade, the exponential growth in commodity CPUs speed has far outpaced advances in memo...
For decades researchers have studied the duality of hashing and sorting for the implementation of th...
Hashing is a well-known and widely used technique for providing O(1) access to large files on second...
Join is an important database operation. As computer architectures evolve, the best join algorithm m...
Extracting valuable information from the rapidly growing field of Big Data faces serious performance...
Modern query engines rely heavily on hash tables for query processing. Overall query performance and...
Modern query engines rely heavily on hash tables for query processing. Overall query performance and...
Abstract—Existing main-memory hash join algorithms for multi-core can be classified into two camps. ...
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash...
Data processing systems often leverage vector instructions to achieve higher performance. When apply...
High-performance analytical data processing systems often run on servers with large amounts of memor...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
Abstract—High-performance analytical data processing sys-tems often run on servers with large amount...
Fast concurrent hash tables are an increasingly important building block as we scale systems to grea...
Hashing is one of the fundamental techniques used to implement query processing operators such as gr...
In the past decade, the exponential growth in commodity CPUs speed has far outpaced advances in memo...
For decades researchers have studied the duality of hashing and sorting for the implementation of th...
Hashing is a well-known and widely used technique for providing O(1) access to large files on second...
Join is an important database operation. As computer architectures evolve, the best join algorithm m...
Extracting valuable information from the rapidly growing field of Big Data faces serious performance...
Modern query engines rely heavily on hash tables for query processing. Overall query performance and...
Modern query engines rely heavily on hash tables for query processing. Overall query performance and...
Abstract—Existing main-memory hash join algorithms for multi-core can be classified into two camps. ...
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash...
Data processing systems often leverage vector instructions to achieve higher performance. When apply...
High-performance analytical data processing systems often run on servers with large amounts of memor...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
Abstract—High-performance analytical data processing sys-tems often run on servers with large amount...
Fast concurrent hash tables are an increasingly important building block as we scale systems to grea...