Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access. In this paper we consider a very large dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical value. We show how to compress such a dataset into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed. Experiments on large, real world datasets (AT&T customer calling patterns) show that the proposed method achieves an average of less than 5% error in any data value after compressing to a mere 2....
The world is drowning in data. The recent explosion of web publishing, XML data, bioinformation, sci...
The ongoing trend for data gathering not only produces larger volumes of data, but also increases th...
Bitmap indices have been widely and successfully used in scientific and commercial databases. Compre...
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the en...
Evolving customer requirements and increasing competition force business organizations to store incr...
Abstract. As a key issue in distributed monitoring, time series data are a series of values collecte...
We introduce a new technique for the efficient management of large sequences of multi-dimensional da...
Current research in indexing and mining time series data has produced many interesting algorithms an...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generated...
The effcient indexing of large and sparse N-gram datasets is crucial in several applications in Info...
We describe a procedure for identifying major minima and maxima of a time series, and present two ap...
As advances in science and technology have continually increased the existence of, and capability fo...
International audienceIndexing is crucial for many data mining tasks that rely on efficient and effe...
Abstract—Bitmap indices are widely used for large read-only repositories in data warehouses and scie...
The detection of similarities withing the time series provided by the Google \(n\)-gram data can hel...
The world is drowning in data. The recent explosion of web publishing, XML data, bioinformation, sci...
The ongoing trend for data gathering not only produces larger volumes of data, but also increases th...
Bitmap indices have been widely and successfully used in scientific and commercial databases. Compre...
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the en...
Evolving customer requirements and increasing competition force business organizations to store incr...
Abstract. As a key issue in distributed monitoring, time series data are a series of values collecte...
We introduce a new technique for the efficient management of large sequences of multi-dimensional da...
Current research in indexing and mining time series data has produced many interesting algorithms an...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generated...
The effcient indexing of large and sparse N-gram datasets is crucial in several applications in Info...
We describe a procedure for identifying major minima and maxima of a time series, and present two ap...
As advances in science and technology have continually increased the existence of, and capability fo...
International audienceIndexing is crucial for many data mining tasks that rely on efficient and effe...
Abstract—Bitmap indices are widely used for large read-only repositories in data warehouses and scie...
The detection of similarities withing the time series provided by the Google \(n\)-gram data can hel...
The world is drowning in data. The recent explosion of web publishing, XML data, bioinformation, sci...
The ongoing trend for data gathering not only produces larger volumes of data, but also increases th...
Bitmap indices have been widely and successfully used in scientific and commercial databases. Compre...