The duplicate-insensitive and time-decayed sum of an arbitrary subset in a stream is an important aggregation for various analyses in many distributed stream scenarios. In general, precisely providing this sum in an unbounded and high-rate stream is infeasible. Therefore, we target at this problem and introduce a sketch, namely, time-decaying Bloom Filter (TDBF). The TDBF can detect duplicates in a stream and meanwhile dynamically maintain decayed-weight of all distinct elements in the stream according to a user-specified decay function. For a query for the current decayed sum of a subset in the stream, TDBF provides an effective estimation. In our theoretical analysis, a provably approximate guarantee has been given for the error of the es...
This thesis is concerned with the study of problems related to the measurement of disorder in the da...
International audienceBloom filters are space-efficient data structures for fast set membership quer...
Bloom filters are efficient randomized data structures for membership queries on a set with a certai...
Detecting duplicates in data streams is an important problem that has a wide range of applications. ...
Abstract Detecting duplicates in data streams is an important problem that has a wide range of appli...
Conference also known as: ICCSIT 2010Approximate duplicate detection based on the Decaying Bloom Fil...
The unparalleled growth and popularity of the Internet cou-pled with the advent of diverse modern ap...
Data intensive applications and computing has emerged as a central area of mod-ern research with the...
In computing, duplicate data detection refers to identifying duplicate copies of repeating data. Ide...
We deal with the problem of detecting frequent items in a stream under the constraint that items are...
Set is widely used as a kind of basic data structure. However, when it is used for large scale data ...
AbstractWe consider the problem of estimating the frequency count of data stream elements under poly...
Several traffic monitoring applications may benefit from the availability of efficient mechanisms fo...
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow mem...
The Bloom Filter (BF), a space-and-time-efficient hashcoding method, is used as one of the fundament...
This thesis is concerned with the study of problems related to the measurement of disorder in the da...
International audienceBloom filters are space-efficient data structures for fast set membership quer...
Bloom filters are efficient randomized data structures for membership queries on a set with a certai...
Detecting duplicates in data streams is an important problem that has a wide range of applications. ...
Abstract Detecting duplicates in data streams is an important problem that has a wide range of appli...
Conference also known as: ICCSIT 2010Approximate duplicate detection based on the Decaying Bloom Fil...
The unparalleled growth and popularity of the Internet cou-pled with the advent of diverse modern ap...
Data intensive applications and computing has emerged as a central area of mod-ern research with the...
In computing, duplicate data detection refers to identifying duplicate copies of repeating data. Ide...
We deal with the problem of detecting frequent items in a stream under the constraint that items are...
Set is widely used as a kind of basic data structure. However, when it is used for large scale data ...
AbstractWe consider the problem of estimating the frequency count of data stream elements under poly...
Several traffic monitoring applications may benefit from the availability of efficient mechanisms fo...
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow mem...
The Bloom Filter (BF), a space-and-time-efficient hashcoding method, is used as one of the fundament...
This thesis is concerned with the study of problems related to the measurement of disorder in the da...
International audienceBloom filters are space-efficient data structures for fast set membership quer...
Bloom filters are efficient randomized data structures for membership queries on a set with a certai...