This paper introduces a class of probabilistic count ing algorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large tile stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) and only a few operat ions per element scanned. The algorithms are based on statistical observat ions made on bits of hashed values of records. They are by con-struction totally insensitive to the replicative structure of elements in the file; they can be used in the context of distributed systems without any degradat ion of per formances and prove especially useful in the context of data bases query optimisation. ‘7 ’ 1985 Academic Press. Inc 1. I...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
Due to the huge increase in the number and dimension of available databases, efficient solutions for...
AbstractThis paper introduces a class of probabilistic counting algorithms with which one can estima...
AbstractThis paper introduces a class of probabilistic counting algorithms with which one can estima...
We present a probabilistic algorithm for counting the number of unique values in the presence of dup...
Abstract. This text is an informal review of several randomized algorithms that have appeared over t...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
SIGLECNRS-CDST / INIST-CNRS - Institut de l'Information Scientifique et TechniqueFRFranc
Counting items in a distributed system, and estimating the cardinality of multisets in particular, i...
Abstract. This paper develops two probabilistic methods that allow the analysis of the maximum data ...
This article considers the problem of cardinality estimation in data stream applications. We present...
International audienceIndexing massive data sets is extremely expensive for large scale problems. In...
International audienceIndexing massive data sets is extremely expensive for large scale problems. In...
International audienceIndexing massive data sets is extremely expensive for large scale problems. In...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
Due to the huge increase in the number and dimension of available databases, efficient solutions for...
AbstractThis paper introduces a class of probabilistic counting algorithms with which one can estima...
AbstractThis paper introduces a class of probabilistic counting algorithms with which one can estima...
We present a probabilistic algorithm for counting the number of unique values in the presence of dup...
Abstract. This text is an informal review of several randomized algorithms that have appeared over t...
. We analyze the storage/accuracy trade--off of an adaptive sampling algorithm due to Wegman that ma...
SIGLECNRS-CDST / INIST-CNRS - Institut de l'Information Scientifique et TechniqueFRFranc
Counting items in a distributed system, and estimating the cardinality of multisets in particular, i...
Abstract. This paper develops two probabilistic methods that allow the analysis of the maximum data ...
This article considers the problem of cardinality estimation in data stream applications. We present...
International audienceIndexing massive data sets is extremely expensive for large scale problems. In...
International audienceIndexing massive data sets is extremely expensive for large scale problems. In...
International audienceIndexing massive data sets is extremely expensive for large scale problems. In...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
Due to the huge increase in the number and dimension of available databases, efficient solutions for...