Semi-stream join algorithms join a fast stream input with a disk-based master data relation. A common class of these algorithms is derived from hash joins: they use the stream as build input for a main hash table, and also include a cache for frequent master data. The composition of the cache is very important for performance; however, the decision of which master data to cache has so far been solely based on heuristics. We present the first formal criterion, a cache inequality that leads to a provably optimal composition of the cache in a semi-stream many-to-many equijoin algorithm. We propose a novel algorithm, Semi-Stream Balanced Join (SSBJ), which exploits this cache inequality to achieve a given service rate with a provably minimal am...
We consider the problem of joining data streams using limited cache memory, with the goal of produci...
[[abstract]]The authors identify some optimality properties of a special type of tree queries, namel...
Efficient and scalable stream joins play an important role in performing real-time analytics for man...
Semi-stream join algorithms join a fast data stream with a disk-based relation. This is important, f...
Efficient resource optimization is critical to manage the velocity and volume of real-time streaming...
Active or real-time data warehousing is becoming very popular in business intelligence domain. In or...
Near real-time data warehousing is an important area of research, as business organisations want to ...
One problem encountered in real-time data integration is the join of a continuous incoming data stre...
Thesis (Ph.D.)--University of Washington, 2021As the demand for data intensive pipelines has grown a...
Join queries are a fundamental database tool, capturing a range of tasks that involve linking hetero...
High-performance analytical data processing systems often run on servers with large amounts of main ...
We optimize multiway equijoins on relational tables using degree information. We give a new bound th...
Due to high data volumes and unpredictable arrival rates, continuous query systems processing expens...
[[abstract]]The problem of optimal query processing in distributed database systems was shown to be ...
International audienceThis paper addresses the problem of computing approximate answers to continuou...
We consider the problem of joining data streams using limited cache memory, with the goal of produci...
[[abstract]]The authors identify some optimality properties of a special type of tree queries, namel...
Efficient and scalable stream joins play an important role in performing real-time analytics for man...
Semi-stream join algorithms join a fast data stream with a disk-based relation. This is important, f...
Efficient resource optimization is critical to manage the velocity and volume of real-time streaming...
Active or real-time data warehousing is becoming very popular in business intelligence domain. In or...
Near real-time data warehousing is an important area of research, as business organisations want to ...
One problem encountered in real-time data integration is the join of a continuous incoming data stre...
Thesis (Ph.D.)--University of Washington, 2021As the demand for data intensive pipelines has grown a...
Join queries are a fundamental database tool, capturing a range of tasks that involve linking hetero...
High-performance analytical data processing systems often run on servers with large amounts of main ...
We optimize multiway equijoins on relational tables using degree information. We give a new bound th...
Due to high data volumes and unpredictable arrival rates, continuous query systems processing expens...
[[abstract]]The problem of optimal query processing in distributed database systems was shown to be ...
International audienceThis paper addresses the problem of computing approximate answers to continuou...
We consider the problem of joining data streams using limited cache memory, with the goal of produci...
[[abstract]]The authors identify some optimality properties of a special type of tree queries, namel...
Efficient and scalable stream joins play an important role in performing real-time analytics for man...