Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously dicult to index or access. In this paper we consider a very large dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical value. We show how to compress such a dataset into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed. Experiments on large, real world datasets (AT&T customer calling patterns) show that the proposed method achieves an average of less than 5 % error in any data value after compressing to a mere 2.5%...
We describe a procedure for identifying major minima and maxima of a time series, and present two ap...
Abstract—We consider the problem of finding similar patterns in a time sequence. Typical application...
The detection of similarities withing the time series provided by the Google \(n\)-gram data can hel...
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the en...
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the en...
Evolving customer requirements and increasing competition force business organizations to store incr...
Current research in indexing and mining time series data has produced many interesting algorithms an...
Abstract. As a key issue in distributed monitoring, time series data are a series of values collecte...
We introduce a new technique for the efficient management of large sequences of multi-dimensional da...
The effcient indexing of large and sparse N-gram datasets is crucial in several applications in Info...
As advances in science and technology have continually increased the existence of, and capability fo...
International audienceIndexing is crucial for many data mining tasks that rely on efficient and effe...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generated...
Abstract—Bitmap indices are widely used for large read-only repositories in data warehouses and scie...
Bitmap indices have been widely and successfully used in scientific and commercial databases. Compre...
We describe a procedure for identifying major minima and maxima of a time series, and present two ap...
Abstract—We consider the problem of finding similar patterns in a time sequence. Typical application...
The detection of similarities withing the time series provided by the Google \(n\)-gram data can hel...
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the en...
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the en...
Evolving customer requirements and increasing competition force business organizations to store incr...
Current research in indexing and mining time series data has produced many interesting algorithms an...
Abstract. As a key issue in distributed monitoring, time series data are a series of values collecte...
We introduce a new technique for the efficient management of large sequences of multi-dimensional da...
The effcient indexing of large and sparse N-gram datasets is crucial in several applications in Info...
As advances in science and technology have continually increased the existence of, and capability fo...
International audienceIndexing is crucial for many data mining tasks that rely on efficient and effe...
We present our approach to enabling approximate ad hoc queries on terabyte-scale mesh data generated...
Abstract—Bitmap indices are widely used for large read-only repositories in data warehouses and scie...
Bitmap indices have been widely and successfully used in scientific and commercial databases. Compre...
We describe a procedure for identifying major minima and maxima of a time series, and present two ap...
Abstract—We consider the problem of finding similar patterns in a time sequence. Typical application...
The detection of similarities withing the time series provided by the Google \(n\)-gram data can hel...