Pattern mining is one of the best-known concepts in Data Mining. A big problem in pattern mining is that humongous amounts of patterns can be mined even from small datasets. This makes it hard for domain experts to discover knowledge using pattern mining, for example in the field of Bioinformatics. In this thesis we address the pattern explosion using compression. We argue that the best pattern set is that set of patterns that compresses the data best. Based on an analysis from MDL (Minimum Description Length) perspective, we introduce a heuristic algorithm, called Krimp, which finds the best set of patterns. High compression ratios and good classification scores confirm that Krimp selects patterns that are very characteristic for the data....
Mining small, useful, and high-quality sets of patterns has recently become an important topic in da...
Data Compression is today essential for a wide range of applications: for example Internet and the W...
The idea of using data compression algorithms for machine learning has been reinvented many times. I...
The discovery of patterns plays an important role in data mining. A pattern can be any type of regul...
Pattern mining based on data compression has been successfully applied in many data mining tasks. Fo...
Compression based pattern mining has been successfully applied to many data mining tasks. We propose...
Nowadays, relational databases have become the de facto standard to store large quantities of data. ...
Data mining provides methods that help to acquire insight in a data set automatically. One of its pr...
We propose a streaming algorithm, based on the minimal description length (MDL) principle, for extra...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
A pattern database (PDB) is a heuristic function implemented as a lookup table that stores the lengt...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
International audiencePattern Mining (PM) has a prominent place in Data Science and finds its applic...
We present an overview of data mining techniques for extracting knowledge from large databases with ...
We present a new technique to compress pattern databases to provide consistent heuristics without lo...
Mining small, useful, and high-quality sets of patterns has recently become an important topic in da...
Data Compression is today essential for a wide range of applications: for example Internet and the W...
The idea of using data compression algorithms for machine learning has been reinvented many times. I...
The discovery of patterns plays an important role in data mining. A pattern can be any type of regul...
Pattern mining based on data compression has been successfully applied in many data mining tasks. Fo...
Compression based pattern mining has been successfully applied to many data mining tasks. We propose...
Nowadays, relational databases have become the de facto standard to store large quantities of data. ...
Data mining provides methods that help to acquire insight in a data set automatically. One of its pr...
We propose a streaming algorithm, based on the minimal description length (MDL) principle, for extra...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
A pattern database (PDB) is a heuristic function implemented as a lookup table that stores the lengt...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
International audiencePattern Mining (PM) has a prominent place in Data Science and finds its applic...
We present an overview of data mining techniques for extracting knowledge from large databases with ...
We present a new technique to compress pattern databases to provide consistent heuristics without lo...
Mining small, useful, and high-quality sets of patterns has recently become an important topic in da...
Data Compression is today essential for a wide range of applications: for example Internet and the W...
The idea of using data compression algorithms for machine learning has been reinvented many times. I...