Exploiting Join Cardinality for Faster Hash Joins

Michael Henderson
Bryce Cutt
Ramon Lawrence

Publication date

March 2015

Abstract

Hash joins combine massive relations in data warehouses, decision support systems, and scientific data stores. Faster hash join performance significantly improves query through-put, response time, and overall system performance. In this work, we demonstrate how using join cardinality improves hash join performance. The key contribution is the devel-opment of an algorithm to determine join cardinality in an arbitrary query plan. We implemented early hash join and the join cardinality algorithm in PostgreSQL. Experimental results demonstrate that early hash join has an immediate response time that is an order of magnitude faster than the existing hybrid hash join implementation. One-to-one joins execute up to 50 % faster and perform significa...