Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper, we present two new parallel join algorithms for coarse-grained machines, which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase (prior to the redistribution phase) to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable. Experimental results are provided on the IBM SP-2
In this paper, we show that shared virtual memory, in a shared-nothing multiprocessor, facilitates t...
Evaluating the relational join is one of the central algorithmic and most well-studied problems in d...
Rapid advances in semiconductor technology have made it possible to build massively parallel process...
AbstractJoin is the most important and expensive operation in relational databases. The parallel joi...
Join is the most important and expensive operation in relational databases. The parallel join operat...
We present an approach to dealing with skew in parallel joins in database systems. Our approach is e...
Skew effects are still a significant problem for efficient query processing in parallel database sys...
A large number of parallel join algorithms has been proposed to maintain load-balancing in the prese...
The performance of joins in parallel database management systems is critical for data intensive oper...
Shared nothing multiprocessor architecture is known to be more scalable to support very large databa...
A consensus on parallel architecture for very large database management has emerged. This architectu...
We analyze the costs, and describe the implementation, of three hashed-based join algorithms for a g...
Abstract—The performance of parallel distributed data man-agement systems becomes increasingly impor...
Abstract—Outer joins are ubiquitous in databases and big data systems. The question of how best to e...
Abstract. A consensus on parallel architecture for very large database manage-ment has emerged. This...
In this paper, we show that shared virtual memory, in a shared-nothing multiprocessor, facilitates t...
Evaluating the relational join is one of the central algorithmic and most well-studied problems in d...
Rapid advances in semiconductor technology have made it possible to build massively parallel process...
AbstractJoin is the most important and expensive operation in relational databases. The parallel joi...
Join is the most important and expensive operation in relational databases. The parallel join operat...
We present an approach to dealing with skew in parallel joins in database systems. Our approach is e...
Skew effects are still a significant problem for efficient query processing in parallel database sys...
A large number of parallel join algorithms has been proposed to maintain load-balancing in the prese...
The performance of joins in parallel database management systems is critical for data intensive oper...
Shared nothing multiprocessor architecture is known to be more scalable to support very large databa...
A consensus on parallel architecture for very large database management has emerged. This architectu...
We analyze the costs, and describe the implementation, of three hashed-based join algorithms for a g...
Abstract—The performance of parallel distributed data man-agement systems becomes increasingly impor...
Abstract—Outer joins are ubiquitous in databases and big data systems. The question of how best to e...
Abstract. A consensus on parallel architecture for very large database manage-ment has emerged. This...
In this paper, we show that shared virtual memory, in a shared-nothing multiprocessor, facilitates t...
Evaluating the relational join is one of the central algorithmic and most well-studied problems in d...
Rapid advances in semiconductor technology have made it possible to build massively parallel process...