The use of computational platforms such as Hadoop and Spark is growing rapidly as a successful paradigm for processing large-scale data residing in distributed file systems like HDFS. Increasing memory sizes have recently led to the introduction of caching and in-memory file systems. However, these systems lack any automated caching mechanisms for storing data in memory. This paper presents AutoCache, a caching framework that automates the decisions for when and which files to store in, or remove from, the cache for increasing system performance. The decisions are based on machine learning models that track and predict file access patterns from evolving data processing workloads. Our evaluation using real-world workload traces from a Facebo...
This paper describes the design, implementation, and evaluation of a predictive file caching approac...
As applications are moving towards peta and exascale data sets, it has become increasingly important...
Data is being generated at an enormous rate, due to online activities and use of resources related t...
Part 2: AIInternational audienceIn the era of big-data, large-scale storage systems use NAND Flash-b...
Scientific collaborations are increasingly relying on large volumes of data for their work and many ...
Large scientific collaborations often have multiple scientists accessing the same set of files while...
Effective file system caching reduces local disk accesses and remote file server accesses significan...
<p>Applications for Internet-enabled devices use machine learning to process captured data to make i...
A huge increase in data storage and processing requirements has lead to Big Data, for which next gen...
The gap between CPU speeds and the speed of the technologies provid-ing the data is increasing. As a...
International audienceCaching can effectively reduce the cost of serving content and improve the use...
The demand for highly parallel data processing platform was growing due to an explosion in the numbe...
We propose and evaluate an approach for decoupling persistent-cache management from general file sys...
The envisaged Storage and Compute needs for the HL-LHC will be a factor up to 10 above what can be a...
The Hadoop Distributed File System (HDFS) is a network file system used to support multiple widely-u...
This paper describes the design, implementation, and evaluation of a predictive file caching approac...
As applications are moving towards peta and exascale data sets, it has become increasingly important...
Data is being generated at an enormous rate, due to online activities and use of resources related t...
Part 2: AIInternational audienceIn the era of big-data, large-scale storage systems use NAND Flash-b...
Scientific collaborations are increasingly relying on large volumes of data for their work and many ...
Large scientific collaborations often have multiple scientists accessing the same set of files while...
Effective file system caching reduces local disk accesses and remote file server accesses significan...
<p>Applications for Internet-enabled devices use machine learning to process captured data to make i...
A huge increase in data storage and processing requirements has lead to Big Data, for which next gen...
The gap between CPU speeds and the speed of the technologies provid-ing the data is increasing. As a...
International audienceCaching can effectively reduce the cost of serving content and improve the use...
The demand for highly parallel data processing platform was growing due to an explosion in the numbe...
We propose and evaluate an approach for decoupling persistent-cache management from general file sys...
The envisaged Storage and Compute needs for the HL-LHC will be a factor up to 10 above what can be a...
The Hadoop Distributed File System (HDFS) is a network file system used to support multiple widely-u...
This paper describes the design, implementation, and evaluation of a predictive file caching approac...
As applications are moving towards peta and exascale data sets, it has become increasingly important...
Data is being generated at an enormous rate, due to online activities and use of resources related t...