To tune and manage themselves, file and storage systems must understand key properties (e.g., access pattern, lifetime, size) of their various files. This paper describes how systems can automatically learn to classify the properties of files (e.g., read-only access pattern, short-lived, small in size) and predict the properties of new files, as they are created, by exploiting the strong associations between a file’s properties and the names and attributes assigned to it. These associations exist, strongly but differently, in each of four real NFS environments studied. Decision tree classifiers can automatically identify and model such associations, providing prediction accuracies that often exceed 90%. Such predictions can be used to selec...
Modern high end computing systems store hundreds of petabytes of data and have billions of files, as...
The rapid advancements in technology have led to a significant increase in the amount of data being ...
Prediction is a powerful tool for performance and usability. It can reduce access latency for I/O sy...
For typical workloads and file naming conventions, the size, lifespan, read/write ratio, and access ...
We present evidence that attributes that are known to the file system when a file is created, such a...
Network File System (NFS, de facto in Linux) or Common Internet File System (CIFS, de facto in Windo...
Systems should be self-predicting. They should continuously monitor themselves and provide quantitat...
As a data-intensive computing application, high-energy physics requires storage and computing for la...
Parallel input/output characterization studies and experiments with flexible resource management alg...
File correlations have become an increasingly important consideration for performance enhancement in...
The main concern in information-rich systems is to efficiently navigate and access desired informati...
Traditionally, maximizing input/output performance has required tailoring application input /output ...
Abstract—Most existing studies of file access prediction are experimental in nature and rely on trac...
The use of computational platforms such as Hadoop and Spark is growing rapidly as a successful parad...
This paper explores an attribute-based approach to storing information in the context of a file syst...
Modern high end computing systems store hundreds of petabytes of data and have billions of files, as...
The rapid advancements in technology have led to a significant increase in the amount of data being ...
Prediction is a powerful tool for performance and usability. It can reduce access latency for I/O sy...
For typical workloads and file naming conventions, the size, lifespan, read/write ratio, and access ...
We present evidence that attributes that are known to the file system when a file is created, such a...
Network File System (NFS, de facto in Linux) or Common Internet File System (CIFS, de facto in Windo...
Systems should be self-predicting. They should continuously monitor themselves and provide quantitat...
As a data-intensive computing application, high-energy physics requires storage and computing for la...
Parallel input/output characterization studies and experiments with flexible resource management alg...
File correlations have become an increasingly important consideration for performance enhancement in...
The main concern in information-rich systems is to efficiently navigate and access desired informati...
Traditionally, maximizing input/output performance has required tailoring application input /output ...
Abstract—Most existing studies of file access prediction are experimental in nature and rely on trac...
The use of computational platforms such as Hadoop and Spark is growing rapidly as a successful parad...
This paper explores an attribute-based approach to storing information in the context of a file syst...
Modern high end computing systems store hundreds of petabytes of data and have billions of files, as...
The rapid advancements in technology have led to a significant increase in the amount of data being ...
Prediction is a powerful tool for performance and usability. It can reduce access latency for I/O sy...