Hashing is one of the fundamental techniques used to implement query processing operators such as grouping, aggregation and join. This paper studies the interaction between modern computer architecture and hash-based query processing techniques. First, we focus on extracting maximum hashing performance from super-scalar CPUs. In particular, we discuss fast hash functions, ways to efficiently handle multi-column keys and propose the use of a recently introduced hashing scheme called Cuckoo Hashing over the commonly used bucket-chained hashing. In the second part of the paper, we focus on the CPU cache usage, by dynamically partitioning data streams such that the partial hash tables fit in the CPU cache. Conventional partitioning works as a s...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
The architectural changes introduced with multicore CPUs have triggered a redesign of main-memory jo...
Modern query engines rely heavily on hash tables for query processing. Overall query performance and...
Hashing is one of the fundamental techniques used to implement query processing operators such as gr...
In the past decade, the exponential growth in commodity CPUs speed has far outpaced advances in memo...
For decades researchers have studied the duality of hashing and sorting for the implementation of th...
Abstract—High-performance analytical data processing sys-tems often run on servers with large amount...
Extracting valuable information from the rapidly growing field of Big Data faces serious performance...
Join is an important database operation. As computer architectures evolve, the best join algorithm m...
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash...
Hashing is a well-known and widely used technique for providing O(1) access to large files on second...
Abstract—Existing main-memory hash join algorithms for multi-core can be classified into two camps. ...
Data processing systems often leverage vector instructions to achieve higher performance. When apply...
Fast concurrent hash tables are an increasingly important building block as we scale systems to grea...
A number of recent papers have considered the influence of modern computer memory hierarchies on the...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
The architectural changes introduced with multicore CPUs have triggered a redesign of main-memory jo...
Modern query engines rely heavily on hash tables for query processing. Overall query performance and...
Hashing is one of the fundamental techniques used to implement query processing operators such as gr...
In the past decade, the exponential growth in commodity CPUs speed has far outpaced advances in memo...
For decades researchers have studied the duality of hashing and sorting for the implementation of th...
Abstract—High-performance analytical data processing sys-tems often run on servers with large amount...
Extracting valuable information from the rapidly growing field of Big Data faces serious performance...
Join is an important database operation. As computer architectures evolve, the best join algorithm m...
Hash join algorithms suffer from extensive CPU cache stalls. This paper shows that the standard hash...
Hashing is a well-known and widely used technique for providing O(1) access to large files on second...
Abstract—Existing main-memory hash join algorithms for multi-core can be classified into two camps. ...
Data processing systems often leverage vector instructions to achieve higher performance. When apply...
Fast concurrent hash tables are an increasingly important building block as we scale systems to grea...
A number of recent papers have considered the influence of modern computer memory hierarchies on the...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
The architectural changes introduced with multicore CPUs have triggered a redesign of main-memory jo...
Modern query engines rely heavily on hash tables for query processing. Overall query performance and...