Process/thread migration and checkpointing schemes support load balancing, load sharing and fault tolerance to improve application performance and system resource us-age on workstation clusters. To enable these schemes to work in heterogeneous environments, we have developed an application-level migration and checkpointing pack-age, MigThread, to abstract computation states at the lan-guage level for portability. To save and restore such states across different platforms, this paper proposes a novel “Re-ceiver Makes Right ” (RMR) data conversion method, called Coarse-Grain Tagged RMR (CGT-RMR), for efficient data marshalling and unmarshalling. Unlike common data rep-resentation standards, CGT-RMR does not require program-mers to analyze dat...
Checkpointing of parallel applications can be used as the core technology to provide process migrati...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-sc...
Process/thread migration and checkpointing are indis-pensable for resource sharing, cycle stealing, ...
Thread migration moves a single call-stack to another machine to improve either load balancing or lo...
Migration concerns saving the current computation state, transferring it to remote machines, and res...
Distributed Shared Memory (DSM) systems provide a logically shared memory over physically distribute...
This paper describes a generic mechanism to migrate threads in heterogeneous distributed environment...
High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, app...
Thread migration is established as a mechanism for achieving dynamic load sharing and data locality....
A lot of research has been done on fault-tolerance for MPI applications, some on checkpoint/restart,...
Thread migration is established as a mechanism for achieving dynamic load sharing and data locality....
This dissertation describes the design, implementation, and performance of two mechanisms that addre...
Thread migration is established as a mechanism for achieving dynamic load sharing and data lo-cality...
This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputin...
Checkpointing of parallel applications can be used as the core technology to provide process migrati...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-sc...
Process/thread migration and checkpointing are indis-pensable for resource sharing, cycle stealing, ...
Thread migration moves a single call-stack to another machine to improve either load balancing or lo...
Migration concerns saving the current computation state, transferring it to remote machines, and res...
Distributed Shared Memory (DSM) systems provide a logically shared memory over physically distribute...
This paper describes a generic mechanism to migrate threads in heterogeneous distributed environment...
High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, app...
Thread migration is established as a mechanism for achieving dynamic load sharing and data locality....
A lot of research has been done on fault-tolerance for MPI applications, some on checkpoint/restart,...
Thread migration is established as a mechanism for achieving dynamic load sharing and data locality....
This dissertation describes the design, implementation, and performance of two mechanisms that addre...
Thread migration is established as a mechanism for achieving dynamic load sharing and data lo-cality...
This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputin...
Checkpointing of parallel applications can be used as the core technology to provide process migrati...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
Clusters of industry-standard multiprocessors are emerging as a competitive alternative for large-sc...