Summarization: While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the ever-increasing data collection rates of modern enterprises and the need for effective, guaranteed-quality approximate answers to queries over massive relational data sets.In this paper, we propose SPARTAN, a system that takes advantage of attribute semantics and data-mining models to perform lossy compression of massive data tables. SPARTAN is based on the novel idea of exploiting predictive data correlations and prescribed error-toleranc...
We formulate a conceptual model for white-box compression, which represents the logical columns in t...
Today\u27s extreme-scale high-performance computing (HPC) applications are producing volumes of data...
In data mining it is important for any transforms made to training data to be replicated on evaluat...
Summarization: While a variety of lossy compression schemes have been developed for certain forms of...
While a variety of lossy compression schemes have been developed for certain forms of digital data (...
Relational datasets are being generated at an alarmingly rapid rate across organizations and industr...
With the ever-increasing volumes of data produced by today\u27s large-scale scientific simulations, ...
Real datasets are often large enough to necessitate data compression. Traditional ‘syntactic ’ data ...
Columnar databases have dominated the data analysis market for their superior performance in query p...
Today\u27s scientific simulations require a significant reduction of the data size because of extrem...
Large-scale data generation, acquisition, and processing are happening at everymoment in our society...
An effective data compressor is becoming increasingly critical to today\u27s scientific research, an...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
We study the problem of compressing massive tables. We devise a novel compression paradigm--training...
through this study, we propose two algorithms. The first algorithm describes the concept of compress...
We formulate a conceptual model for white-box compression, which represents the logical columns in t...
Today\u27s extreme-scale high-performance computing (HPC) applications are producing volumes of data...
In data mining it is important for any transforms made to training data to be replicated on evaluat...
Summarization: While a variety of lossy compression schemes have been developed for certain forms of...
While a variety of lossy compression schemes have been developed for certain forms of digital data (...
Relational datasets are being generated at an alarmingly rapid rate across organizations and industr...
With the ever-increasing volumes of data produced by today\u27s large-scale scientific simulations, ...
Real datasets are often large enough to necessitate data compression. Traditional ‘syntactic ’ data ...
Columnar databases have dominated the data analysis market for their superior performance in query p...
Today\u27s scientific simulations require a significant reduction of the data size because of extrem...
Large-scale data generation, acquisition, and processing are happening at everymoment in our society...
An effective data compressor is becoming increasingly critical to today\u27s scientific research, an...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
We study the problem of compressing massive tables. We devise a novel compression paradigm--training...
through this study, we propose two algorithms. The first algorithm describes the concept of compress...
We formulate a conceptual model for white-box compression, which represents the logical columns in t...
Today\u27s extreme-scale high-performance computing (HPC) applications are producing volumes of data...
In data mining it is important for any transforms made to training data to be replicated on evaluat...