The goal of this research was to investigate the potential for employing dynamic, decentralized software architectures to achieve reliability in future high-performance computing platforms. These architectures, inspired by peer-to-peer networks such as botnets that already scale to millions of unreliable nodes, hold promise for enabling scientific applications to run usefully on next-generation exascale platforms ({approx} 10{sup 18} operations per second). Traditional parallel programming techniques suffer rapid deterioration of performance scaling with growing platform size, as the work of coping with increasingly frequent failures dominates over useful computation. Our studies suggest that new architectures, in which failures are treated...
Abstract—Owing to the extreme parallelism and the high component failure rates of tomorrow’s exascal...
We present here a report produced by a workshop on “Addressing Failures in Exascale Computing” held ...
Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughou...
The emergence of petascale systems and the promise of future exascale systems have reinvigorated the...
We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held ...
To enable future scientific breakthroughs and discoveries, the next generation of scientific applica...
International audienceExtreme scale parallel computing systems will have tens of thousands ...
International audienceExtreme scale parallel computing systems will have tens of thousands of option...
The Petascale Computing Enabling Technologies (PCET) project addressed challenges arising from curre...
c © The Authors 2015. This paper is published with open access at SuperFri.org Extreme scale paralle...
As supercomputers become larger and more powerful, they are growing increasingly complex. This is re...
High-Performance Computing (HPC) has passed the Petascale mark and is moving forward to Exascale. As...
The next generation of supercomputers will break the exascale barrier. Soon we will have systems cap...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Resilience is a major roadblock for HPC executions on future exascale systems. These systems will ty...
Abstract—Owing to the extreme parallelism and the high component failure rates of tomorrow’s exascal...
We present here a report produced by a workshop on “Addressing Failures in Exascale Computing” held ...
Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughou...
The emergence of petascale systems and the promise of future exascale systems have reinvigorated the...
We present here a report produced by a workshop on ‘Addressing failures in exascale computing’ held ...
To enable future scientific breakthroughs and discoveries, the next generation of scientific applica...
International audienceExtreme scale parallel computing systems will have tens of thousands ...
International audienceExtreme scale parallel computing systems will have tens of thousands of option...
The Petascale Computing Enabling Technologies (PCET) project addressed challenges arising from curre...
c © The Authors 2015. This paper is published with open access at SuperFri.org Extreme scale paralle...
As supercomputers become larger and more powerful, they are growing increasingly complex. This is re...
High-Performance Computing (HPC) has passed the Petascale mark and is moving forward to Exascale. As...
The next generation of supercomputers will break the exascale barrier. Soon we will have systems cap...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Resilience is a major roadblock for HPC executions on future exascale systems. These systems will ty...
Abstract—Owing to the extreme parallelism and the high component failure rates of tomorrow’s exascal...
We present here a report produced by a workshop on “Addressing Failures in Exascale Computing” held ...
Distributed systems and extreme-scale systems are ubiquitous in recent years and have seen throughou...