Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (or balanced) parallel subtasks. In this paper, we aim to find, materialize and maintain a set of partitioning elements (splitters) for a data set. Unlike traditional partitioning elements, our splitters define both inequality and equality partitions, which allows us to bound the size of the inequality partitions. We provide an algorithm for determining an optimal set of splitters from a sorted data set and show that it has time complexity O(k lg_2 N), where k is the number of splitters requested and N is the size of the data set. We show how the algorithm can be ext...
In shared-disk database systems, disk access has to be scheduled properly to avoid unnecessary conte...
Efficient join processing is one of the most fundamental and well-studied tasks in database research...
AbstractJoin is the most important and expensive operation in relational databases. The parallel joi...
Multiprocessor implementation of the relational database operators has recently received great atten...
A splittable good provided in n pieces shall be divided as evenly as possible among m agents, where ...
Vertical partitioning is the process of subdividing the attributes of a relation into groups, creati...
ABSTRACT Inequality joins, which join relational tables on inequality conditions, are used in variou...
Big data analytics often involves complex join queries over two or more tables. Such join process...
We present Schism, a novel workload-aware approach for database partitioning and replication designe...
In this paper we present the Sandwich Operators, an elegant approach to exploit pre-sorting or pre-g...
Massive scale data stores, which exhibit highly desirable scalability and availability properties ar...
Table partitioning splits a table into smaller parts that can be accessed, stored, and maintained in...
We propose a methodology for optimal k-way partitioning with replication of directed hypergraphs via...
Physical database design is important for query performance in a shared-nothing parallel database sy...
Efficient query processing is a critical requirement for data warehousing systems as decision suppor...
In shared-disk database systems, disk access has to be scheduled properly to avoid unnecessary conte...
Efficient join processing is one of the most fundamental and well-studied tasks in database research...
AbstractJoin is the most important and expensive operation in relational databases. The parallel joi...
Multiprocessor implementation of the relational database operators has recently received great atten...
A splittable good provided in n pieces shall be divided as evenly as possible among m agents, where ...
Vertical partitioning is the process of subdividing the attributes of a relation into groups, creati...
ABSTRACT Inequality joins, which join relational tables on inequality conditions, are used in variou...
Big data analytics often involves complex join queries over two or more tables. Such join process...
We present Schism, a novel workload-aware approach for database partitioning and replication designe...
In this paper we present the Sandwich Operators, an elegant approach to exploit pre-sorting or pre-g...
Massive scale data stores, which exhibit highly desirable scalability and availability properties ar...
Table partitioning splits a table into smaller parts that can be accessed, stored, and maintained in...
We propose a methodology for optimal k-way partitioning with replication of directed hypergraphs via...
Physical database design is important for query performance in a shared-nothing parallel database sy...
Efficient query processing is a critical requirement for data warehousing systems as decision suppor...
In shared-disk database systems, disk access has to be scheduled properly to avoid unnecessary conte...
Efficient join processing is one of the most fundamental and well-studied tasks in database research...
AbstractJoin is the most important and expensive operation in relational databases. The parallel joi...