As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, NLP, and other text analysis techniques. The HathiTrust Re-search Center (HTRC) was recently established to provision for automated analytical techniques on the over 11 million digitized volumes (books) of the HathiTrust digital repository. The HTRC data store that hosts and provisions access to HathiTrust volumes needs to be efficient, fault-tolerant and large-scale. In this paper, we propose three schema designs of Cassandra NoSQL store to represent HathiTrust corpus and perform extensive performance evaluation using simulated workloads. The experimental results demonstrate that encapsulating the whole volume ...
A concept of distributed replicated NoSQL data storages Cassandra-like, HBase, MongoDB has been prop...
For several years the PanDA Workload Management System has been the basis for distributed production...
Usage of cloud-based storage systems gained a lot of prominence in fast few years. Every day million...
As digital data sources grow in number and size, they pose an op-portunity for computational investi...
Libraries are seeing growing numbers of digitized textual corpora that frequently come with restrict...
To work with large amount of data consisting of 4 v�s velocity, variety, volume and veracity that is...
NoSQL databases have emerged as a backend to support Big Data applications. NoSQL databases are char...
Cloud computing is a paradigm shift that provides computing over Internet. With growing outreach of ...
Relational databases have been the main model for information data storage, retrieval and administra...
We investigate the problem of performance and cost optimization for two types of cloudnative distrib...
All companies developing their business on the Web, not only giants like Google or Facebook but also...
So far relational databases are used for storing the data for the applications but now there is need...
Cassandra is a NoSQL(Not only Structured Query Language) database which serves large amount of data ...
HathiTrust Research Center (HTRC) allows users to access more than 3 million volumes through a serv...
The amount of internet-connected devices is rapidly expanding. Embedded with various sensors, these ...
A concept of distributed replicated NoSQL data storages Cassandra-like, HBase, MongoDB has been prop...
For several years the PanDA Workload Management System has been the basis for distributed production...
Usage of cloud-based storage systems gained a lot of prominence in fast few years. Every day million...
As digital data sources grow in number and size, they pose an op-portunity for computational investi...
Libraries are seeing growing numbers of digitized textual corpora that frequently come with restrict...
To work with large amount of data consisting of 4 v�s velocity, variety, volume and veracity that is...
NoSQL databases have emerged as a backend to support Big Data applications. NoSQL databases are char...
Cloud computing is a paradigm shift that provides computing over Internet. With growing outreach of ...
Relational databases have been the main model for information data storage, retrieval and administra...
We investigate the problem of performance and cost optimization for two types of cloudnative distrib...
All companies developing their business on the Web, not only giants like Google or Facebook but also...
So far relational databases are used for storing the data for the applications but now there is need...
Cassandra is a NoSQL(Not only Structured Query Language) database which serves large amount of data ...
HathiTrust Research Center (HTRC) allows users to access more than 3 million volumes through a serv...
The amount of internet-connected devices is rapidly expanding. Embedded with various sensors, these ...
A concept of distributed replicated NoSQL data storages Cassandra-like, HBase, MongoDB has been prop...
For several years the PanDA Workload Management System has been the basis for distributed production...
Usage of cloud-based storage systems gained a lot of prominence in fast few years. Every day million...