Many scientific problems benefit from computations that are parallel at a coarse grain. Collections of loosely-coupled, heterogeneous computers are increasingly being applied to these problems. While individual computers are designed to be relatively reliable, a collection of several autonomous machines necessarily has a greater rate of failure. As data networks improve, and larger multicomputers are being used, rates of failure will increase. PVM (Parallel Virtual Machine) [Sun90, GS92] is a popular software framework that facilitates message-passing network programming. We present enhancements to PVM to mask fail-stop, single-node failures from the application. Fail-safe PVM uses checkpoint and rollback to recover from such failures. Both...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
One of the more bothersome aspects of developing a parallel program is that of monitoring the behavi...
The PVM system -- which is one of the most popular message-passing interface currently -- represents...
There is a growing trend toward distributed computing -- writing programs that run across multiple n...
A network multicomputer is a multiprocessor in which the processors are connected by general-purpose...
Message passing applications on a distributed computer require tools to integrate state saving and r...
In the past decade, the use of distributed algorithms to model simulations is considerably increased...
We are currently involved in research to enable PVM to take advantage of shared networks of workstat...
This article presents mEDA-2, an extension to PVM which provides Virtual Shared Memory, VSM, for int...
Parallel Virtual Machine (PVM) is a standard software system for parallel computing on networked com...
Due to the character of the original source materials and the nature of batch digitization, quality ...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
We have implemented a commercial enterprise-grade system for providing fault-tolerant virtual machin...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
Large-scale distributed systems are very attractive for the execution of parallel applications requi...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
One of the more bothersome aspects of developing a parallel program is that of monitoring the behavi...
The PVM system -- which is one of the most popular message-passing interface currently -- represents...
There is a growing trend toward distributed computing -- writing programs that run across multiple n...
A network multicomputer is a multiprocessor in which the processors are connected by general-purpose...
Message passing applications on a distributed computer require tools to integrate state saving and r...
In the past decade, the use of distributed algorithms to model simulations is considerably increased...
We are currently involved in research to enable PVM to take advantage of shared networks of workstat...
This article presents mEDA-2, an extension to PVM which provides Virtual Shared Memory, VSM, for int...
Parallel Virtual Machine (PVM) is a standard software system for parallel computing on networked com...
Due to the character of the original source materials and the nature of batch digitization, quality ...
: Distributed Shared Memory (dsm) architectures are attractive to execute high performance parallel ...
We have implemented a commercial enterprise-grade system for providing fault-tolerant virtual machin...
This study explores a recovery strategy using checkpointing in a distributed shared virtual memory (...
Large-scale distributed systems are very attractive for the execution of parallel applications requi...
Crash and omission failures are common in service providers: a disk can break down or a link can fai...
One of the more bothersome aspects of developing a parallel program is that of monitoring the behavi...
The PVM system -- which is one of the most popular message-passing interface currently -- represents...