Taufer, MichelaNon-determinism in high performance scientific applications has severe detri- mental impacts for both numerical reproducibility and accuracy, and debugging. As scientific simulations are migrated to extreme-scale platforms, the increase in platform concurrency and the attendant increase in non-determinism is likely to exacerbate both of these problems. In this thesis, we address the dual challenges of non-determinism’s impact on numerical reproducibility and on debugging. ☐ To address the numerical challenge, our work investigates the power of mathe- matical methods to mitigate error propagation at the exascale. We focus on floating- point error accumulation over global summations where enforcing any reduction order is expens...
Relative debugging helps trace software errors by comparing two concurrent executions of a program -...
Abstract. The precise semantics of floating-point arithmetic programs depends on the execution platf...
Errors pose a serious threat to the output validity of modern data processing, which is often perfor...
Debugging large-scale, data-intensive, distributed applications running in a datacenter ("datacenter...
Ideal hardware performance counters provide exact deterministic results. Real-world performance moni...
pre-printReproducibility, the ability to repeat program executions with the same numerical result or...
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale...
Deterministic replay tools offer a compelling approach to debugging hard-to-reproduce bugs. Recent w...
International audienceQuestions whether numerical simulation is reproducible or not have been report...
One of the main reasons for the difficulty of hardware verification is that hardware platforms are t...
On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, espec...
Statistical debugging identifies program behaviors that are highly correlated with failures. Tra...
Abstract—Bit-reproducibility has many advantages in the context of high-performance computing. Besid...
Submitted to the JSA Elsevier JournalThis paper investigates the application of Deterministic Record...
Numerical Reproducibility at Exascale (NRE2015) workshop held as part of the Supercomputing Conferen...
Relative debugging helps trace software errors by comparing two concurrent executions of a program -...
Abstract. The precise semantics of floating-point arithmetic programs depends on the execution platf...
Errors pose a serious threat to the output validity of modern data processing, which is often perfor...
Debugging large-scale, data-intensive, distributed applications running in a datacenter ("datacenter...
Ideal hardware performance counters provide exact deterministic results. Real-world performance moni...
pre-printReproducibility, the ability to repeat program executions with the same numerical result or...
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale...
Deterministic replay tools offer a compelling approach to debugging hard-to-reproduce bugs. Recent w...
International audienceQuestions whether numerical simulation is reproducible or not have been report...
One of the main reasons for the difficulty of hardware verification is that hardware platforms are t...
On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, espec...
Statistical debugging identifies program behaviors that are highly correlated with failures. Tra...
Abstract—Bit-reproducibility has many advantages in the context of high-performance computing. Besid...
Submitted to the JSA Elsevier JournalThis paper investigates the application of Deterministic Record...
Numerical Reproducibility at Exascale (NRE2015) workshop held as part of the Supercomputing Conferen...
Relative debugging helps trace software errors by comparing two concurrent executions of a program -...
Abstract. The precise semantics of floating-point arithmetic programs depends on the execution platf...
Errors pose a serious threat to the output validity of modern data processing, which is often perfor...