Over the past decade, GPUs have become an integral part of mainstream high-performance computing (HPC) facilities. Since applications running on HPC systems are usually long-running, any error or failure could result in significant loss in scientific productivity and system resources. Even worse, since HPC systems face severe resilience challenges as progressing towards exascale computing, it is imperative to develop a better understanding of the reliability of GPUs. This dissertation fills this gap by providing an understanding of the effects of soft errors on the entire system and on specific applications. To understand system-level reliability, a large-scale study on GPU soft errors in the field is conducted. The occurrences of GPU soft ...
Abstract General Purpose Graphics Processing Units (GPGPUs) have been extensively used in the last d...
Graphics Processing Units (GPUs) are considered a promising solution for high-performance safety-cri...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Over the past decade, GPUs have become an integral part of mainstream high-performance computing (HP...
GPGPUs are used increasingly in several domains, from gaming to different kinds of compu...
Abstract—While graphics processing units (GPUs) have gained wide adoption as accelerators for genera...
Multiprocessor system-on-chip such as embedded GPUs are becoming very popular in safety-critical app...
Graphics Processing Units (GPUs) are popular for reliability-conscious uses in High Performance Comp...
State-of-the-art GPU chips are designed to deliver extreme throughput for graphics as well as for da...
Abstract—Graphics processing units (GPUs) are gaining widespread use in high-performance computing b...
Proceeding of: 31th European Symposium on Reliability of Electron Devices, Failure Physics and Analy...
There have been an extensive use of Convolutional Neural Networks (CNNs) in healthcare applications....
General Purpose computing on Graphics Processing Unit offers a remarkable speedup for data parallel ...
There have been an extensive use of Convolutional Neural Networks (CNNs) in healthcare applications....
International audienceGraphics Processing Units (GPUs) are over-stressed to accelerate High-Performa...
Abstract General Purpose Graphics Processing Units (GPGPUs) have been extensively used in the last d...
Graphics Processing Units (GPUs) are considered a promising solution for high-performance safety-cri...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
Over the past decade, GPUs have become an integral part of mainstream high-performance computing (HP...
GPGPUs are used increasingly in several domains, from gaming to different kinds of compu...
Abstract—While graphics processing units (GPUs) have gained wide adoption as accelerators for genera...
Multiprocessor system-on-chip such as embedded GPUs are becoming very popular in safety-critical app...
Graphics Processing Units (GPUs) are popular for reliability-conscious uses in High Performance Comp...
State-of-the-art GPU chips are designed to deliver extreme throughput for graphics as well as for da...
Abstract—Graphics processing units (GPUs) are gaining widespread use in high-performance computing b...
Proceeding of: 31th European Symposium on Reliability of Electron Devices, Failure Physics and Analy...
There have been an extensive use of Convolutional Neural Networks (CNNs) in healthcare applications....
General Purpose computing on Graphics Processing Unit offers a remarkable speedup for data parallel ...
There have been an extensive use of Convolutional Neural Networks (CNNs) in healthcare applications....
International audienceGraphics Processing Units (GPUs) are over-stressed to accelerate High-Performa...
Abstract General Purpose Graphics Processing Units (GPGPUs) have been extensively used in the last d...
Graphics Processing Units (GPUs) are considered a promising solution for high-performance safety-cri...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...