[[abstract]]Distributed environments such as networks of workstations are becoming more cost-effective and popular, and more high performance computations are moving into such environments. Although in the scientific computing arena NOWs have been used mainly for their high performance, their potential capability in reliability and high availability has not been fully exploited. One important area where this capability can be exploited on NOWs is to ensure data reliability and availability through the parallel I/O system. In this paper, we investigate the availability issues in parallel I/O systems with a shared-nothing disk configuration. A new file replication method, called locality aware file replication (LAFR), is proposed. LAFR is app...
[[abstract]]This paper presents a parallel file object environment to support distributed array stor...
[[abstract]]Providing data availability in a high performance computing environment is very importan...
Distributed systems provide the opportunity for fault tolerance through replication. This dissertati...
AbstractAs parallel le systems span larger and larger numbers of nodes in order to provide the perfo...
Replication plays an important role for storage system to improve data availability, throughputand r...
Parallel input/output in high performance computing is a field of increasing importance. In particul...
Abstract—As multi-petascale and exa-scale high-performance computing (HPC) systems inevitably have t...
Nowadays, replication technique is widely used in data centerstorage systems to prevent data loss. D...
International audienceDumping large amounts of related data simulta-neously to local storage devices...
[[abstract]]In this paper, we propose a new fault-tolerant model for replication in distributed-file...
This thesis studies the problem of file replication in distributed systems. File replication is desi...
Abstract—This paper studies the problem of code symbol availability: a code symbol is said to have (...
The advent of wide-area high-speed networks provides the framework for deploying large scale applica...
Abstract—Massively parallel applications often require periodic data checkpointing for program resta...
The introduction of Exascale storage into production systems will lead to an increase on the number ...
[[abstract]]This paper presents a parallel file object environment to support distributed array stor...
[[abstract]]Providing data availability in a high performance computing environment is very importan...
Distributed systems provide the opportunity for fault tolerance through replication. This dissertati...
AbstractAs parallel le systems span larger and larger numbers of nodes in order to provide the perfo...
Replication plays an important role for storage system to improve data availability, throughputand r...
Parallel input/output in high performance computing is a field of increasing importance. In particul...
Abstract—As multi-petascale and exa-scale high-performance computing (HPC) systems inevitably have t...
Nowadays, replication technique is widely used in data centerstorage systems to prevent data loss. D...
International audienceDumping large amounts of related data simulta-neously to local storage devices...
[[abstract]]In this paper, we propose a new fault-tolerant model for replication in distributed-file...
This thesis studies the problem of file replication in distributed systems. File replication is desi...
Abstract—This paper studies the problem of code symbol availability: a code symbol is said to have (...
The advent of wide-area high-speed networks provides the framework for deploying large scale applica...
Abstract—Massively parallel applications often require periodic data checkpointing for program resta...
The introduction of Exascale storage into production systems will lead to an increase on the number ...
[[abstract]]This paper presents a parallel file object environment to support distributed array stor...
[[abstract]]Providing data availability in a high performance computing environment is very importan...
Distributed systems provide the opportunity for fault tolerance through replication. This dissertati...