Abstract—xSim is a simulation-based performance investiga-tion toolkit that permits running high-performance computing (HPC) applications in a controlled environment with millions of concurrent execution threads, while observing application performance in a simulated extreme-scale system for hard-ware/software co-design. The presented work details newly developed features for xSim that permit the injection of MPI process failures, the propagation/detection/notification of such failures within the simulation, and their handling using application-level checkpoint/restart. These new capabilities en-able the observation of application behavior and performance under failure within a simulated future-generation HPC system using the most common fa...
With the deployment of 10-20 PFlop/s supercomputers and the exascale roadmap targeting 100, 300, and...
Abstract—The Extreme-scale Simulator (xSim) is a recently developed performance investigation toolki...
The consistent trends of increasing core counts and decreasing mean-time-to-failure in supercomputer...
As supercomputers scale to 1,000 PFlop/s over the next decade, investi-gating the performance of par...
Supercomputers have played an essential role in the progress of science and engineering research. As...
Scientists use advanced computing techniques to assist in answering the complex questions at the for...
2015-08-04Future exascale high-performance computing (HPC) systems will be constructed using VLSI de...
AbstractThis paper discusses on-going work with the Integrated Plasma Simulator (IPS), a framework f...
Large scale systems provide a powerful computing platform for solving large and complex scientific a...
High Performance Computing (HPC) brings with it the promise of deeper insight into complex phenomen...
HPC systems are widely used in industrial, economical, and scientific applications, and many of thes...
Abstract. Large-scale computing platforms provide tremendous capabilities for scientific discovery. ...
As supercomputers become larger and more powerful, they are growing increasingly complex. This is re...
The running times of many computational science applications are much longer than the mean-time-to-f...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
With the deployment of 10-20 PFlop/s supercomputers and the exascale roadmap targeting 100, 300, and...
Abstract—The Extreme-scale Simulator (xSim) is a recently developed performance investigation toolki...
The consistent trends of increasing core counts and decreasing mean-time-to-failure in supercomputer...
As supercomputers scale to 1,000 PFlop/s over the next decade, investi-gating the performance of par...
Supercomputers have played an essential role in the progress of science and engineering research. As...
Scientists use advanced computing techniques to assist in answering the complex questions at the for...
2015-08-04Future exascale high-performance computing (HPC) systems will be constructed using VLSI de...
AbstractThis paper discusses on-going work with the Integrated Plasma Simulator (IPS), a framework f...
Large scale systems provide a powerful computing platform for solving large and complex scientific a...
High Performance Computing (HPC) brings with it the promise of deeper insight into complex phenomen...
HPC systems are widely used in industrial, economical, and scientific applications, and many of thes...
Abstract. Large-scale computing platforms provide tremendous capabilities for scientific discovery. ...
As supercomputers become larger and more powerful, they are growing increasingly complex. This is re...
The running times of many computational science applications are much longer than the mean-time-to-f...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
With the deployment of 10-20 PFlop/s supercomputers and the exascale roadmap targeting 100, 300, and...
Abstract—The Extreme-scale Simulator (xSim) is a recently developed performance investigation toolki...
The consistent trends of increasing core counts and decreasing mean-time-to-failure in supercomputer...