Often many records in a database share similar values for several attributes. If one is able to identify and group these records together that share similar values for some — even if not all — attributes, not only does one have the possibility of a more parsimonious representation of the data, but one may also gain useful insight into the data from an analysis and mining perspective. In this thesis, we introduce the notion of fascicles. A fascicle F(k,t) is a subset of records that have k compact attributes. An attribute A of a collection F of records is compact if the width of the range of A-values (for numeric attributes) or the number of distinct A-values (for categorical attributes) of all the records in F does not exceed t. We i...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
Pattern mining based on data compression has been successfully applied in many data mining tasks. Fo...
While a variety of lossy compression schemes have been developed for certain forms of digital data (...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
A pattern database (PDB) is a heuristic function implemented as a lookup table that stores the lengt...
Pattern mining is one of the best-known concepts in Data Mining. A big problem in pattern mining is ...
The discovery of patterns plays an important role in data mining. A pattern can be any type of regul...
Nowadays, relational databases have become the de facto standard to store large quantities of data. ...
The collection indexing problem is defined as follows: Given a collection of highly similar strings,...
[[abstract]]The past few years have witnessed several exciting results on compressed representation ...
One common pattern database compression technique is to merge adjacent database entries and store th...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
We consider the problem of reducing a potentially very large dataset to a subset of representative p...
Relational datasets are being generated at an alarmingly rapid rate across organizations and industr...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
Pattern mining based on data compression has been successfully applied in many data mining tasks. Fo...
While a variety of lossy compression schemes have been developed for certain forms of digital data (...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
A pattern database (PDB) is a heuristic function implemented as a lookup table that stores the lengt...
Pattern mining is one of the best-known concepts in Data Mining. A big problem in pattern mining is ...
The discovery of patterns plays an important role in data mining. A pattern can be any type of regul...
Nowadays, relational databases have become the de facto standard to store large quantities of data. ...
The collection indexing problem is defined as follows: Given a collection of highly similar strings,...
[[abstract]]The past few years have witnessed several exciting results on compressed representation ...
One common pattern database compression technique is to merge adjacent database entries and store th...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
We consider the problem of reducing a potentially very large dataset to a subset of representative p...
Relational datasets are being generated at an alarmingly rapid rate across organizations and industr...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
The tasks of extracting (top-K) Frequent Itemsets (FI’s) and Association Rules (AR’s) are fundamenta...
Pattern mining based on data compression has been successfully applied in many data mining tasks. Fo...
While a variety of lossy compression schemes have been developed for certain forms of digital data (...