Big data systems such as relational databases, data science platforms, and scientific workflows all process queries over large and complex datasets. Skew is common in these real-world datasets and workloads. Different types of skew can have different impacts on the performance of query processing. Although skew sometimes causes load imbalance in a parallel execution environment, negatively impacting query performance, we demonstrate in this thesis that, in many cases we can actually improve the query performance in the presence of skew. To optimize query processing under skew, we develop a set of techniques to exploit the positive effects of skew and to avoid the negative effects. In order to exploit skew, we propose techniques including: (...
Large relational databases are a part of all of our lives. The government uses them and almost any s...
Current cluster computing frameworks suffer from load imbalance and limited parallelism due to skewe...
ABSTRACT Massive data analysis in cloud-scale data centers plays a crucial role in making critical b...
As queries grow increasingly complex and large data sets are becoming prevalent, Parallel Query Proc...
We study the problem of computing a conjunctive query q in parallel, using p of servers, on a large ...
Skew effects are still a significant problem for efficient query processing in parallel database sys...
We present an approach to dealing with skew in parallel joins in database systems. Our approach is e...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
Skew effects are a serious problem in parallel database systems, but the relationship between differ...
Abstract—Outer joins are ubiquitous in databases and big data systems. The question of how best to e...
Thesis (Ph.D.)--University of Washington, 2015The need to analyze and understand big data has change...
Thesis (Ph.D.)--University of Washington, 2012Science and business are generating data at an unprece...
Outer joins are ubiquitous in databases and big data systems. The question of how best to execute ou...
Large relational databases are a part of all of our lives. The government uses them and almost any s...
Over the past decade, a number of data intensive scalable systems have been developed to process ext...
Large relational databases are a part of all of our lives. The government uses them and almost any s...
Current cluster computing frameworks suffer from load imbalance and limited parallelism due to skewe...
ABSTRACT Massive data analysis in cloud-scale data centers plays a crucial role in making critical b...
As queries grow increasingly complex and large data sets are becoming prevalent, Parallel Query Proc...
We study the problem of computing a conjunctive query q in parallel, using p of servers, on a large ...
Skew effects are still a significant problem for efficient query processing in parallel database sys...
We present an approach to dealing with skew in parallel joins in database systems. Our approach is e...
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an i...
Skew effects are a serious problem in parallel database systems, but the relationship between differ...
Abstract—Outer joins are ubiquitous in databases and big data systems. The question of how best to e...
Thesis (Ph.D.)--University of Washington, 2015The need to analyze and understand big data has change...
Thesis (Ph.D.)--University of Washington, 2012Science and business are generating data at an unprece...
Outer joins are ubiquitous in databases and big data systems. The question of how best to execute ou...
Large relational databases are a part of all of our lives. The government uses them and almost any s...
Over the past decade, a number of data intensive scalable systems have been developed to process ext...
Large relational databases are a part of all of our lives. The government uses them and almost any s...
Current cluster computing frameworks suffer from load imbalance and limited parallelism due to skewe...
ABSTRACT Massive data analysis in cloud-scale data centers plays a crucial role in making critical b...