Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this work, we formally define compaction as an optimization problem that attempts to minimize disk I/O. We prove this problem to be NPHard. We then propose a set of algorithms and mathematically analyze upper bounds on worst-case cost. We evaluate the proposed algorithms on real-life workloads. Our results show that our algorithms incur low I/O costs and that a compaction approach using a balanced tree is most preferable
Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as...
Abstract. Evaluating a query can involve manipulation of large vol-umes of temporary data. When the ...
Declustering is a well known strategy to achieve maximum I/O parallelism in multi-disk systems. Many...
Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this w...
NoSQL databases are widely used for massive data storage and real-time web applications. Yet importa...
Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutab...
We initiate the formal study of the online stack-compaction policies used by big-data NoSQL database...
Context: The global communication system is in a tremendous growth, leading to wide range of data ge...
The age of Big data has transformed into the era of Internet of Things (IoT) where massive scale dat...
Context. The present trend in a large variety of applications are ranging from the web and social ne...
In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent ...
We present a formal analysis of the database layout problem, i.e., the problem of determining how da...
There are a variety of main-memory access structures, such as segment trees, and quad trees, whose p...
Abstract. In this paper, we propose a new bulk-loading technique for high-di-mensional indexes which...
The amount of internet-connected devices is rapidly expanding. Embedded with various sensors, these ...
Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as...
Abstract. Evaluating a query can involve manipulation of large vol-umes of temporary data. When the ...
Declustering is a well known strategy to achieve maximum I/O parallelism in multi-disk systems. Many...
Compaction plays a crucial role in NoSQL systems to ensure a high overall read throughput. In this w...
NoSQL databases are widely used for massive data storage and real-time web applications. Yet importa...
Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutab...
We initiate the formal study of the online stack-compaction policies used by big-data NoSQL database...
Context: The global communication system is in a tremendous growth, leading to wide range of data ge...
The age of Big data has transformed into the era of Internet of Things (IoT) where massive scale dat...
Context. The present trend in a large variety of applications are ranging from the web and social ne...
In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent ...
We present a formal analysis of the database layout problem, i.e., the problem of determining how da...
There are a variety of main-memory access structures, such as segment trees, and quad trees, whose p...
Abstract. In this paper, we propose a new bulk-loading technique for high-di-mensional indexes which...
The amount of internet-connected devices is rapidly expanding. Embedded with various sensors, these ...
Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as...
Abstract. Evaluating a query can involve manipulation of large vol-umes of temporary data. When the ...
Declustering is a well known strategy to achieve maximum I/O parallelism in multi-disk systems. Many...