Efficiently compressing string columnar data using frequent pattern mining

Wang, Xiaojian

Publication date

June 2016

Abstract

In modern column-oriented databases, compression is important for improving I/O throughput and overall database performance. Many string columnar data cannot be compressed by special-purpose algorithms such as run-length encoding or dictionary compression, and the typical choice for them is the LZ77-based compression algorithms such as GZIP or Snappy. These algorithms treat data as a byte block and do not exploit the columnar nature of the data. In this thesis, we develop a compression algorithm using frequent string patterns directly mined from a sample of a string column. The patterns are used as the dictionary phrases for compression. We discuss some interesting properties of frequent patterns in the context of compression, and develop a...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Efficiently compressing string columnar data using frequent pattern mining

Abstract

Extracted data

Efficiently compressing string columnar data using frequent pattern mining

Abstract

Extracted data

Related items

Related items