As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs), or silent errors, are one of the major sources that corrupt the execution results of HPC applications without being detected. In this work, we explore a set of novel SDC detectors – by leveraging epsilon-insensitive support vector machine regression – to detect SDCs that occur in HPC applications. The key contributions are threefold. (1) Our exploration takes temporal, spatial, and spatiotemporal features into account and analyzes different detectors based on different features. (2) We provide an in-depth study on the ...
Probabilistic Support Vector Machine Classification (PSVC) is a real time detection and prediction a...
The Support Vector Machine (SVM) method has been used with success in classifying Partial Discharge ...
Phasor Measurement Units (PMUs) provide high-quality state information about the electrical grid in ...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded syst...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
Hardware errors are on the rise with reducing chip sizes, and power constraints have necessitated th...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
In a Real Time Clearing System (RTCS) there are several thousands of transactions per second, and ev...
<p>Chip manufacturers and hyperscalers are becoming increasingly aware of the problem posed by...
<p>Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the majo...
This report describes a unified framework for the detection and correction of silent errors,which co...
Data reduction techniques have been widely demanded and used by large-scale high performance computi...
Handling faults is a growing concern in HPC; greater varieties, higher error rates, larger detection...
Probabilistic Support Vector Machine Classification (PSVC) is a real time detection and prediction a...
The Support Vector Machine (SVM) method has been used with success in classifying Partial Discharge ...
Phasor Measurement Units (PMUs) provide high-quality state information about the electrical grid in ...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems ...
Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded syst...
Abstract: Many methods are available to detect silent errors in high-performance computing (HPC) app...
Hardware errors are on the rise with reducing chip sizes, and power constraints have necessitated th...
According to Moore’s law, technology scaling is continuously providing smaller and faster devices. T...
In a Real Time Clearing System (RTCS) there are several thousands of transactions per second, and ev...
<p>Chip manufacturers and hyperscalers are becoming increasingly aware of the problem posed by...
<p>Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the majo...
This report describes a unified framework for the detection and correction of silent errors,which co...
Data reduction techniques have been widely demanded and used by large-scale high performance computi...
Handling faults is a growing concern in HPC; greater varieties, higher error rates, larger detection...
Probabilistic Support Vector Machine Classification (PSVC) is a real time detection and prediction a...
The Support Vector Machine (SVM) method has been used with success in classifying Partial Discharge ...
Phasor Measurement Units (PMUs) provide high-quality state information about the electrical grid in ...