The evaluation of a program's behaviour in the presence of transient faults is often a very time consuming work. In order to achieve significant data, thousands of executions are required and each execution will have the significant overhead of the fault injection environment. A previously published methodology reduced significantly the time needed to evaluate the robustness of a program execution by exhaustively analysing its execution trace instead of using fault injection. In this paper we present a further improvement in the evaluation time of parallel programs robustness against transient faults by combining this methodology with PAS2P - a method that strives to describe an application based on its message-passing activity. This combin...
Emerging MPI libraries, such as VolpexMPI and P2P MPI, allow message passing parallel programs to ex...
International audienceHigh performance computing platforms such as Clusters, Grid and Desktop Grids ...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
AbstractAnalyzing and predicting performance in parallel applications is a great challenge for scien...
Resiliency of exascale systems has quickly become an important concern for the scientific community....
International audienceWe present in this paper a study on fault management in a grid middleware. The...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redunda...
Handling faults is a growing concern in HPC; higher error rates, larger detection intervals and sile...
Handling faults is a growing concern in HPC; greater varieties, higher error rates, larger detection...
This thesis focuses on fault-tolerance for MPI codes on computational clusters. When an application ...
The predictability of various types of program information has been the subject of a plethora of wor...
MPI is the de-facto standard message-passing based parallel programming model. However, the bug dete...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Emerging MPI libraries, such as VolpexMPI and P2P MPI, allow message passing parallel programs to ex...
International audienceHigh performance computing platforms such as Clusters, Grid and Desktop Grids ...
Emerging MPI libraries, such as VolpexMPI and P2P MPI, allow message passing parallel programs to ex...
International audienceHigh performance computing platforms such as Clusters, Grid and Desktop Grids ...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
AbstractAnalyzing and predicting performance in parallel applications is a great challenge for scien...
Resiliency of exascale systems has quickly become an important concern for the scientific community....
International audienceWe present in this paper a study on fault management in a grid middleware. The...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redunda...
Handling faults is a growing concern in HPC; higher error rates, larger detection intervals and sile...
Handling faults is a growing concern in HPC; greater varieties, higher error rates, larger detection...
This thesis focuses on fault-tolerance for MPI codes on computational clusters. When an application ...
The predictability of various types of program information has been the subject of a plethora of wor...
MPI is the de-facto standard message-passing based parallel programming model. However, the bug dete...
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)...
Emerging MPI libraries, such as VolpexMPI and P2P MPI, allow message passing parallel programs to ex...
International audienceHigh performance computing platforms such as Clusters, Grid and Desktop Grids ...
Emerging MPI libraries, such as VolpexMPI and P2P MPI, allow message passing parallel programs to ex...
International audienceHigh performance computing platforms such as Clusters, Grid and Desktop Grids ...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...