Computing systems are becoming increasingly complex with nodes consisting of a combination of multi-core central processing units (CPUs), many integrated core (MIC) and graphics processing unit (GPU) accelerators. These computing units and their interconnections are subject to different classes of hardware and software faults, which should be detected to support mitigation measures. We present the chaotic-map method that uses the exponential divergence and wide Fourier properties of the trajectories, combined with memory allocations and assignments to diagnose component-level faults in these hybrid computing systems. We propose lightweight codes that utilize highly parallel chaotic-map computations tailored to isolate faults in arithmetic u...
[[abstract]]The goal of the fault diagnosis agreement (FDA) problem is to make each fault-free proce...
Probabilistic diagnosis aims at making the system-level fault diagnostic problem both easier to solv...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
This dissertation summarizes experimental validation and co-design studies conducted to optimize the...
This book offers a selection of papers in the field of fault detection and diagnosis, promoting new ...
Deep learning technology has enabled the development of increasingly complex safety-related autonomo...
Deep learning technology has enabled the development of increasingly complex safety-related autonomo...
[[abstract]]The reliability of processors is an important issue for designing amassively parallel pr...
We develop a widely applicable algorithm to solve the fault diagnosis problem in certain distributed...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
[[abstract]]Hypercube multiprocessor systems are attracted by many researchers in parallel processin...
AbstractComparison-based diagnosis is a practical approach to the system-level fault diagnosis of mu...
As chip densities and clock rates increase, processors are becoming more susceptible to transient fa...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
International audienceGraphics Processing Units (GPUs) are over-stressed to accelerate High-Performa...
[[abstract]]The goal of the fault diagnosis agreement (FDA) problem is to make each fault-free proce...
Probabilistic diagnosis aims at making the system-level fault diagnostic problem both easier to solv...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...
This dissertation summarizes experimental validation and co-design studies conducted to optimize the...
This book offers a selection of papers in the field of fault detection and diagnosis, promoting new ...
Deep learning technology has enabled the development of increasingly complex safety-related autonomo...
Deep learning technology has enabled the development of increasingly complex safety-related autonomo...
[[abstract]]The reliability of processors is an important issue for designing amassively parallel pr...
We develop a widely applicable algorithm to solve the fault diagnosis problem in certain distributed...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
[[abstract]]Hypercube multiprocessor systems are attracted by many researchers in parallel processin...
AbstractComparison-based diagnosis is a practical approach to the system-level fault diagnosis of mu...
As chip densities and clock rates increase, processors are becoming more susceptible to transient fa...
Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only...
International audienceGraphics Processing Units (GPUs) are over-stressed to accelerate High-Performa...
[[abstract]]The goal of the fault diagnosis agreement (FDA) problem is to make each fault-free proce...
Probabilistic diagnosis aims at making the system-level fault diagnostic problem both easier to solv...
Over three decades of continuous scaling in CMOS technology has led to tremendous improvements in pr...