We present an algorithm to extract an high-quality approximation of the (top-k) Frequent itemsets (FIs) from random samples of a transactional dataset. With high probability the approximation is a superset of the FIs, and no itemset with frequency much lower than the threshold is included in it. The algorithm employs progressive sampling, with a stopping condition based on bounds to the empirical Rademacher average, a key concept from statistical learning theory. The computation of the bounds uses characteristic quantities that can be obtained efficiently with a single scan of the sample. Therefore, evaluating the stopping condition is fast, and does not require an expensive mining of each sample. Our experimental evaluation confirms the pr...
A sequential pattern is a sequence of sets of items. Mining sequential patterns from very large data...
Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic s...
International audienceAssociation rule discovery based on support-confidence framework is an importa...
Abstract. We study the use of sampling for efficiently mining the top-K frequent itemsets of cardina...
We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at m...
Frequent Itemsets (FIs) mining is a fundamental primitive in knowledge discovery. It requires to ide...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
Sequential pattern mining is a fundamental data mining task with application in several domains. We ...
[[abstract]]Frequent-itemset mining only considers the frequency of occurrence of the items but does...
Abstract We present ProSecCo, an algorithm for the progressive mining of frequent sequences from la...
AbstractIn this paper, we focus on the problem of mining the approximate frequent itemsets. To impro...
Frequent itemset mining is a classical data mining task with a broad range of applications, includin...
Most of the complexity of common data mining tasks is due to the unknown amount of information conta...
Within data mining, the efficient discovery of frequent patterns—sets of items that occur together ...
A sequential pattern is a sequence of sets of items. Mining sequential patterns from very large data...
Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic s...
International audienceAssociation rule discovery based on support-confidence framework is an importa...
Abstract. We study the use of sampling for efficiently mining the top-K frequent itemsets of cardina...
We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at m...
Frequent Itemsets (FIs) mining is a fundamental primitive in knowledge discovery. It requires to ide...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
Sequential pattern mining is a fundamental data mining task with application in several domains. We ...
[[abstract]]Frequent-itemset mining only considers the frequency of occurrence of the items but does...
Abstract We present ProSecCo, an algorithm for the progressive mining of frequent sequences from la...
AbstractIn this paper, we focus on the problem of mining the approximate frequent itemsets. To impro...
Frequent itemset mining is a classical data mining task with a broad range of applications, includin...
Most of the complexity of common data mining tasks is due to the unknown amount of information conta...
Within data mining, the efficient discovery of frequent patterns—sets of items that occur together ...
A sequential pattern is a sequence of sets of items. Mining sequential patterns from very large data...
Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic s...
International audienceAssociation rule discovery based on support-confidence framework is an importa...