AbstractThis paper discusses on-going work with the Integrated Plasma Simulator (IPS), a framework for coupled multiphysics simulations of plasmas, to allow simulations to run through the loss of nodes on which the simulation is executing.While many different techniques are available to improve the fault tolerance of computational science applications on high-performance computer systems, checkpoint/restart (C/R) remains virtually the only one that see widespread use in practice. Our focus here is to augment the traditional C/R approach with additional techniques that can provide a more localized and tailored response to faults based on the ability to restart failed tasks on an individual basis, and the use of information external to the ap...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...
Supercomputers have played an essential role in the progress of science and engineering research. As...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...
AbstractThis paper discusses on-going work with the Integrated Plasma Simulator (IPS), a framework f...
International audienceWorkflows systems are considered here to support large-scale multiphysics simu...
We propose a novel, minimally intrusive approach to adding fault tolerance to existing complex scien...
International audienceLarge-scale simulations, e.g. fluid-structure interactions and aeroacoustics n...
International audienceLarge-scale simulations, e.g. fluid-structure interactions and aeroacoustics n...
We propose a novel, minimally intrusive approach to adding fault tolerance to existing complex scien...
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2010Scientists use advanced computing techni...
Scientists use advanced computing techniques to assist in answering the complex questions at the for...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...
Supercomputers have played an essential role in the progress of science and engineering research. As...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...
AbstractThis paper discusses on-going work with the Integrated Plasma Simulator (IPS), a framework f...
International audienceWorkflows systems are considered here to support large-scale multiphysics simu...
We propose a novel, minimally intrusive approach to adding fault tolerance to existing complex scien...
International audienceLarge-scale simulations, e.g. fluid-structure interactions and aeroacoustics n...
International audienceLarge-scale simulations, e.g. fluid-structure interactions and aeroacoustics n...
We propose a novel, minimally intrusive approach to adding fault tolerance to existing complex scien...
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2010Scientists use advanced computing techni...
Scientists use advanced computing techniques to assist in answering the complex questions at the for...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...
Supercomputers have played an essential role in the progress of science and engineering research. As...
This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation mid...