The relative ease of collaborative data science and analysis has led to a proliferation of many thousands or millions of versions of the same datasets in many scientific and commercial domains, acquired or constructed at various stages of data analysis across many users, and often over long periods of time. Managing, storing, and recreating these dataset versions is a non-trivial task. The fundamental challenge here is the storage-recreation trade-off: the more storage we use, the faster it is to recreate or retrieve versions, while the less storage we use, the slower it is to recreate or retrieve versions. Despite the fundamental nature of this problem, there has been a surprisingly little amount of work on it. In this paper, we study this...
The ability to access and query data stored in multiple versions is an important asset for many appl...
Efficient management of versioned (temporal) data is an important problem for a variety of business ...
Data-driven methods and products are becoming increasingly common in a variety of communities, leadi...
With the massive proliferation of datasets in a variety of sectors, data science teams in these sect...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Version Control Systems were primarily designed to keep track of and provide control over changes to...
The demand for better reproducibility of research results is growing. With more data becoming availa...
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (...
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Abstract—The ability to access and query data stored in multiple versions is an important asset for ...
Abstract. This paper presents a framework for understanding and con-structing access methods for ver...
This thesis investigates the extent to which delta-based version control can be used to manage versi...
As scientific endeavors and data analysis become increasingly collaborative, there is a need for dat...
The ability to access and query data stored in multiple versions is an important asset for many appl...
Efficient management of versioned (temporal) data is an important problem for a variety of business ...
Data-driven methods and products are becoming increasingly common in a variety of communities, leadi...
With the massive proliferation of datasets in a variety of sectors, data science teams in these sect...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Version Control Systems were primarily designed to keep track of and provide control over changes to...
The demand for better reproducibility of research results is growing. With more data becoming availa...
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (...
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Abstract—The ability to access and query data stored in multiple versions is an important asset for ...
Abstract. This paper presents a framework for understanding and con-structing access methods for ver...
This thesis investigates the extent to which delta-based version control can be used to manage versi...
As scientific endeavors and data analysis become increasingly collaborative, there is a need for dat...
The ability to access and query data stored in multiple versions is an important asset for many appl...
Efficient management of versioned (temporal) data is an important problem for a variety of business ...
Data-driven methods and products are becoming increasingly common in a variety of communities, leadi...