We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA’00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the...
Can we use machine learning to compress graph data? The absence of ordering in graphs poses a signif...
We introduce a problem class we call Polynomial Constraint Satisfaction Problems, or PCSP. Where the...
Abstract—With the rise of datacenter virtualization, the number of entries in forwarding tables is e...
We study the problem of compressing massive tables within the partition-training paradigm introduced...
We study the problem of compressing massive tables. We devise a novel compression paradigm--training...
Sorting database tables before compressing them improves the compression rate. Can we do better than...
Sorting database tables before compressing them improves the compression rate. Can we do better than...
International audienceIn this paper, we propose an improvement of the compression step of sliced tab...
Data Compression Techniques for massive tables are described. Related methodological results are als...
A pattern database (PDB) is a heuristic function implemented as a lookup table that stores the lengt...
International audienceMany industrial applications require the use of table constraints (e.g., in co...
A genetic algorithm is applied on a sparse table compression technique. The latter takes the form of...
Abstract. We study the behaviour of an algorithm which compresses relational tables by representing ...
We first consider the problem of partitioning the edges of a graph G into bipartite cliques such tha...
In this paper we address the problem of trading optimally, and in a principled way, the compressed s...
Can we use machine learning to compress graph data? The absence of ordering in graphs poses a signif...
We introduce a problem class we call Polynomial Constraint Satisfaction Problems, or PCSP. Where the...
Abstract—With the rise of datacenter virtualization, the number of entries in forwarding tables is e...
We study the problem of compressing massive tables within the partition-training paradigm introduced...
We study the problem of compressing massive tables. We devise a novel compression paradigm--training...
Sorting database tables before compressing them improves the compression rate. Can we do better than...
Sorting database tables before compressing them improves the compression rate. Can we do better than...
International audienceIn this paper, we propose an improvement of the compression step of sliced tab...
Data Compression Techniques for massive tables are described. Related methodological results are als...
A pattern database (PDB) is a heuristic function implemented as a lookup table that stores the lengt...
International audienceMany industrial applications require the use of table constraints (e.g., in co...
A genetic algorithm is applied on a sparse table compression technique. The latter takes the form of...
Abstract. We study the behaviour of an algorithm which compresses relational tables by representing ...
We first consider the problem of partitioning the edges of a graph G into bipartite cliques such tha...
In this paper we address the problem of trading optimally, and in a principled way, the compressed s...
Can we use machine learning to compress graph data? The absence of ordering in graphs poses a signif...
We introduce a problem class we call Polynomial Constraint Satisfaction Problems, or PCSP. Where the...
Abstract—With the rise of datacenter virtualization, the number of entries in forwarding tables is e...