Count queries belong to a class of summary statistics routinely used in basket analysis, inventory tracking, and study cohort finding. In this article, we demonstrate how it is possible to use simple count queries for parallelizing sequential data mining algorithms. Specifically, we parallelize a published algorithm for finding minimum sets of discriminating features and demonstrate that the parallel speedup is close to the expected optimum. 
Currently, clustering applications use classical methods to partition a set of data (or objects) in ...
Abstract. Numerical data (e.g., DNA micro-array data, sensor data) pose a challeng-ing problem to ex...
International audienceNumerical data (e.g., DNA micro-array data, sensor data) pose a challenging pr...
Count queries belong to a class of summary statistics routinely used in basket analysis, inventor...
The fast increase in the size and number of databases demands data mining approaches that are scalab...
A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance mea...
An important issue in data mining is scalability with respect to the size of the dataset being min...
The goal of data mining algorithm is to discover useful information embedded in large databases. Fre...
This paper presents a parallel feature selection method for classification that scales up to very hi...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science o...
With the fast, continuous increase in the number and size of databases, parallel data mining is a na...
Data structures for efficient sampling from a set of weighted items are an important building block ...
Currently, clustering applications use classical methods to partition a set of data (or objects) in ...
Currently, clustering applications use classical methods to partition a set of data (or objects) in ...
Abstract. Numerical data (e.g., DNA micro-array data, sensor data) pose a challeng-ing problem to ex...
International audienceNumerical data (e.g., DNA micro-array data, sensor data) pose a challenging pr...
Count queries belong to a class of summary statistics routinely used in basket analysis, inventor...
The fast increase in the size and number of databases demands data mining approaches that are scalab...
A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance mea...
An important issue in data mining is scalability with respect to the size of the dataset being min...
The goal of data mining algorithm is to discover useful information embedded in large databases. Fre...
This paper presents a parallel feature selection method for classification that scales up to very hi...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
This paper introduces new algorithms and data structures for quick counting for machine learning dat...
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science o...
With the fast, continuous increase in the number and size of databases, parallel data mining is a na...
Data structures for efficient sampling from a set of weighted items are an important building block ...
Currently, clustering applications use classical methods to partition a set of data (or objects) in ...
Currently, clustering applications use classical methods to partition a set of data (or objects) in ...
Abstract. Numerical data (e.g., DNA micro-array data, sensor data) pose a challeng-ing problem to ex...
International audienceNumerical data (e.g., DNA micro-array data, sensor data) pose a challenging pr...