This paper presents an efficient framework for error-bounded compression of high-dimensional discreteattribute datasets. Such datasets, which frequently arise in a wide variety of applications, pose some of the most significant challenges in data analysis. Sub-sampling and compression are two key technologies for analyzing these datasets. The proposed framework, PROXIMUS, provides a technique for reducing large datasets into a much smaller set of representative patterns, on which traditional (expensive) analysis algorithms can be applied with minimal loss of accuracy. We show desirable properties of PROXIMUS in terms of runtime, scalability to large datasets, and performance in terms of capability to represent data in a compact form and dis...
Compression based pattern mining has been successfully applied to many data mining tasks. We propose...
The attention towards binary data coding increased consistently in the last decade due to several re...
The need for the ability to cluster unknown data to better understand its relationship to know data ...
Pattern mining is one of the best-known concepts in Data Mining. A big problem in pattern mining is ...
We explore connections of low-rank matrix factorizations with interesting problems in data mining an...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and ...
A key element in the success of data analysis is the strong contribu- tion of visualization: dendrog...
Data Compression is today essential for a wide range of applications: for example Internet and the W...
Often many records in a database share similar values for several attributes. If one is able to ide...
Due to the increasing power of data acquisition and data storage technologies, a large amount of dat...
The idea of using data compression algorithms for machine learning has been reinvented many times. I...
With the availability of large scale computing platforms and instrumentation for data gathering, inc...
Pattern mining based on data compression has been successfully applied in many data mining tasks. Fo...
Data compression, data prediction, data classification, learning and data mining are all strictly re...
Compression based pattern mining has been successfully applied to many data mining tasks. We propose...
The attention towards binary data coding increased consistently in the last decade due to several re...
The need for the ability to cluster unknown data to better understand its relationship to know data ...
Pattern mining is one of the best-known concepts in Data Mining. A big problem in pattern mining is ...
We explore connections of low-rank matrix factorizations with interesting problems in data mining an...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and ...
A key element in the success of data analysis is the strong contribu- tion of visualization: dendrog...
Data Compression is today essential for a wide range of applications: for example Internet and the W...
Often many records in a database share similar values for several attributes. If one is able to ide...
Due to the increasing power of data acquisition and data storage technologies, a large amount of dat...
The idea of using data compression algorithms for machine learning has been reinvented many times. I...
With the availability of large scale computing platforms and instrumentation for data gathering, inc...
Pattern mining based on data compression has been successfully applied in many data mining tasks. Fo...
Data compression, data prediction, data classification, learning and data mining are all strictly re...
Compression based pattern mining has been successfully applied to many data mining tasks. We propose...
The attention towards binary data coding increased consistently in the last decade due to several re...
The need for the ability to cluster unknown data to better understand its relationship to know data ...