Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other be...
Abstract. Driven by the two main hardware trends increasing main memory and massively parallel multi...
Recently, Haas and Hellerstein proposed the hash ripple join algorithm in the context of online aggr...
Join is the most important and expensive operation in relational databases. The parallel join operat...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
Traditionally, analytical database engines have used task parallelism provided by modern multisocket...
In the past decade, the exponential growth in commodity CPUs speed has far outpaced advances in memo...
The widening performance gap between CPU and disk is significant for hash join performance. Most cur...
Abstract—Existing main-memory hash join algorithms for multi-core can be classified into two camps. ...
Data movement between memory and CPU is a well-known energy bottleneck for analytics. Near-Memory Pr...
The architectural changes introduced with multicore CPUs have triggered a redesign of main-memory jo...
Abstract—The architectural changes introduced with multi-core CPUs have triggered a redesign of main...
AbstractJoin is the most important and expensive operation in relational databases. The parallel joi...
In this paper we examine the issues involved in adding concurrency to the Robin Hood hash table algo...
Hashing is one of the fundamental techniques used to implement query processing operators such as gr...
Join is the most important and expensive operation in relational databases. The parallel join operat...
Abstract. Driven by the two main hardware trends increasing main memory and massively parallel multi...
Recently, Haas and Hellerstein proposed the hash ripple join algorithm in the context of online aggr...
Join is the most important and expensive operation in relational databases. The parallel join operat...
We present new hash tables for joins, and a hash join based on them, that consumes far less memory a...
Traditionally, analytical database engines have used task parallelism provided by modern multisocket...
In the past decade, the exponential growth in commodity CPUs speed has far outpaced advances in memo...
The widening performance gap between CPU and disk is significant for hash join performance. Most cur...
Abstract—Existing main-memory hash join algorithms for multi-core can be classified into two camps. ...
Data movement between memory and CPU is a well-known energy bottleneck for analytics. Near-Memory Pr...
The architectural changes introduced with multicore CPUs have triggered a redesign of main-memory jo...
Abstract—The architectural changes introduced with multi-core CPUs have triggered a redesign of main...
AbstractJoin is the most important and expensive operation in relational databases. The parallel joi...
In this paper we examine the issues involved in adding concurrency to the Robin Hood hash table algo...
Hashing is one of the fundamental techniques used to implement query processing operators such as gr...
Join is the most important and expensive operation in relational databases. The parallel join operat...
Abstract. Driven by the two main hardware trends increasing main memory and massively parallel multi...
Recently, Haas and Hellerstein proposed the hash ripple join algorithm in the context of online aggr...
Join is the most important and expensive operation in relational databases. The parallel join operat...