Output-optimal parallel algorithms for similarity joins

Hu, Xiao
Tao, Yufei
Yi, Ke

Open link

Publication date

January 2017

DOI

10.1145/3034786.3056110

Publisher

Association for Computing Machinery (ACM)

Citation count (estimate)

Abstract

Parallel join algorithms have received much attention in recent years, due to the rapid development of massively parallel systems such as MapReduce and Spark. In the database theory community, most efforts have been focused on studying worst-optimal algorithms. However, the worst-case optimality of these join algorithms relies on the hard instances having very large output sizes. In the case of a two-relation join, the hard instance is just a Cartesian product, with an output size that is quadratic in the input size. In practice, however, the output size is usually much smaller. One recent parallel join algorithm by Beame et al. [8] has achieved output-optimality, i.e., its cost is optimal in terms of both the input size and the output size...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Output-optimal parallel algorithms for similarity joins

Abstract

Extracted data

Output-optimal parallel algorithms for similarity joins

Abstract

Extracted data

Related items

Related items