Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. To exploit this parallelism, threaded dataflow execution models are promising, since they provide side-effect free execution and reduced synchronization overhead. But the terascale transistor integration of such chips make them orders of magnitude more vulnerable to voltage fluctuation, radiation, and process variations. This means reliability techniques have to be an essential part of such future systems, too. In this paper, we conceptualize a fault tolerant architecture for a scalable threaded dataflow system. We provide methods to detect permanent, intermittent, and transient faults during the execution. Furthermore, we propose a recovery t...
This paper presents the design, implementation and evaluation of a dataflow system, including a data...
This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled Multi...
As high performance computing (HPC) systems continue to grow, their fault rate increases. Applicatio...
Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. T...
The high parallelism of future Teradevices, which are going to contain more than 1,000 complex cores...
Abstract The high parallelism of future Teradevices, which are going to contain more than 1,000 comp...
This electronic version was submitted by the student author. The certified thesis is available in th...
As microprocessors continue to evolve and grow in function-ality, the use of smaller nanometer techn...
textSilicon reliability has reemerged as a very important problem in digital system design. As volta...
The progress of the semiconductor technology and the resulting increase of the transistor integratio...
The TERAFLUX project is a Future and Emerging Technologies (FET) Large-Scale Project funded by the E...
AbstractThe number of cores per chip keeps increasing in order to improve performance while controll...
The improvements in semiconductor technologies are gradually enabling extreme-scale systems such as ...
We present a technique that masks failures in a cluster to provide high availability and fault-toler...
The number of cores per chip keeps increasing in order to improve performance while controlling the ...
This paper presents the design, implementation and evaluation of a dataflow system, including a data...
This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled Multi...
As high performance computing (HPC) systems continue to grow, their fault rate increases. Applicatio...
Future computing systems (Teradevices) will probably contain more than 1000 cores on a single die. T...
The high parallelism of future Teradevices, which are going to contain more than 1,000 complex cores...
Abstract The high parallelism of future Teradevices, which are going to contain more than 1,000 comp...
This electronic version was submitted by the student author. The certified thesis is available in th...
As microprocessors continue to evolve and grow in function-ality, the use of smaller nanometer techn...
textSilicon reliability has reemerged as a very important problem in digital system design. As volta...
The progress of the semiconductor technology and the resulting increase of the transistor integratio...
The TERAFLUX project is a Future and Emerging Technologies (FET) Large-Scale Project funded by the E...
AbstractThe number of cores per chip keeps increasing in order to improve performance while controll...
The improvements in semiconductor technologies are gradually enabling extreme-scale systems such as ...
We present a technique that masks failures in a cluster to provide high availability and fault-toler...
The number of cores per chip keeps increasing in order to improve performance while controlling the ...
This paper presents the design, implementation and evaluation of a dataflow system, including a data...
This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled Multi...
As high performance computing (HPC) systems continue to grow, their fault rate increases. Applicatio...