Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DATA HUB, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale
Many kinds of different scientific data are being produced every day by research institutes across t...
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (...
With a growing demand for transparency and openness around scientific research and an emphasis on th...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
With the massive proliferation of datasets in a variety of sectors, data science teams in these sect...
While there have been many solutions proposed for storing and an-alyzing large volumes of data, all ...
While there have been many solutions proposed for storing and analyzing large volumes of data, all o...
As scientific endeavors and data analysis become increasingly collaborative, there is a need for dat...
This thesis will cover the deign, manual, and implementation detail of OrpheusDB
Openness and collaboration go hand in hand. Samuel Payne describes how scientists at the Pacific Nor...
The relative ease of collaborative data science and analysis has led to a proliferation of many thou...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Collaboration is one of the most important topics regarding the evolution of the World Wide Web and ...
Version Control Systems were primarily designed to keep track of and provide control over changes to...
Many kinds of different scientific data are being produced every day by research institutes across t...
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (...
With a growing demand for transparency and openness around scientific research and an emphasis on th...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
Relational databases have limited support for data collaboration, where teams collaboratively curate...
With the massive proliferation of datasets in a variety of sectors, data science teams in these sect...
While there have been many solutions proposed for storing and an-alyzing large volumes of data, all ...
While there have been many solutions proposed for storing and analyzing large volumes of data, all o...
As scientific endeavors and data analysis become increasingly collaborative, there is a need for dat...
This thesis will cover the deign, manual, and implementation detail of OrpheusDB
Openness and collaboration go hand in hand. Samuel Payne describes how scientists at the Pacific Nor...
The relative ease of collaborative data science and analysis has led to a proliferation of many thou...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Collaboration is one of the most important topics regarding the evolution of the World Wide Web and ...
Version Control Systems were primarily designed to keep track of and provide control over changes to...
Many kinds of different scientific data are being produced every day by research institutes across t...
A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (...
With a growing demand for transparency and openness around scientific research and an emphasis on th...