We introduce and study the problem of computing the simi- larity self-join in a streaming context (sssj), where the input is an unbounded stream of items arriving continuously. The goal is to find all pairs of items in the stream whose similarity is greater than a given threshold. The simplest formulation of the problem requires unbounded memory, and thus, it is intractable. To make the problem feasible, we introduce the notion of time-dependent similarity: the similarity of two items decreases with the difference in their arrival time. By leveraging the properties of this time-dependent sim- ilarity function, we design two algorithmic frameworks to solve the sssj problem. The first one, MiniBatch (MB), uses existing index-based filtering t...
This paper introduces a class of join algorithms, termed W-join, for joining multiple infinite data ...
Streaming data analysis has recently attracted at-tention in numerous applications including telepho...
We study the problem of estimating the size of a matching when the graph is revealed in a streaming ...
We introduce and study the problem of computing the simi- larity self-join in a streaming context (s...
We provide efficient support for applications that aim to continuously find pairs of similar sets in...
Similarity join (SJ) in time-series databases has a wide spectrum of applications such as data clean...
Nowadays online monitoring of data streams is essential in many real life applications, like sensor ...
Continuously identifying pre-defined patterns in a streaming time series has strong demand in variou...
Similarity join processing in the streaming environment has many practical applications such as sens...
Abstract. Large network analysis is a very important topic in data mining. A significant body of wor...
Similarity search over stream time series has a wide spectrum of applications. Most previous work in...
We investigate adaptive buffer management techniques for approximate evaluation of sliding window jo...
We investigate the problem of deterministic pattern matching in multiple streams. In this model, one...
In this thesis, we give efficient algorithms and near-tight lower bounds for the following problems ...
abstract: Similarity Joins are some of the most useful and powerful data processing techniques. They...
This paper introduces a class of join algorithms, termed W-join, for joining multiple infinite data ...
Streaming data analysis has recently attracted at-tention in numerous applications including telepho...
We study the problem of estimating the size of a matching when the graph is revealed in a streaming ...
We introduce and study the problem of computing the simi- larity self-join in a streaming context (s...
We provide efficient support for applications that aim to continuously find pairs of similar sets in...
Similarity join (SJ) in time-series databases has a wide spectrum of applications such as data clean...
Nowadays online monitoring of data streams is essential in many real life applications, like sensor ...
Continuously identifying pre-defined patterns in a streaming time series has strong demand in variou...
Similarity join processing in the streaming environment has many practical applications such as sens...
Abstract. Large network analysis is a very important topic in data mining. A significant body of wor...
Similarity search over stream time series has a wide spectrum of applications. Most previous work in...
We investigate adaptive buffer management techniques for approximate evaluation of sliding window jo...
We investigate the problem of deterministic pattern matching in multiple streams. In this model, one...
In this thesis, we give efficient algorithms and near-tight lower bounds for the following problems ...
abstract: Similarity Joins are some of the most useful and powerful data processing techniques. They...
This paper introduces a class of join algorithms, termed W-join, for joining multiple infinite data ...
Streaming data analysis has recently attracted at-tention in numerous applications including telepho...
We study the problem of estimating the size of a matching when the graph is revealed in a streaming ...