The work described in this paper aims at effective computation resilience for complex simulations in high performance and distributed environments. Computation resilience is a complicated and delicate area; it deals with many types of simulation cores, many types of data on various input levels and also with many types of end-users, which have different requirements and expectations. Predictions about system and computation behaviors must be done based on deep knowledge about underlying infrastructures, and simulations' mathematical and realization backgrounds. Our conceptual framework is intended to allow independent collaborations between domain experts as end-users and providers of the computational power by taking on all of the deployme...
In this report we present a thorough study of the concept of resiliency in distributed workflow syst...
Large scale simulations are used in a variety of application areas in science and engineering to hel...
This report presents an approach to design, implement and deploy resilient distributed workflows. It...
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale...
International audienceWorkflows systems are considered here to support large-scale multiphysics simu...
International audienceThis paper presents an approach to design, implement and deploy a simulation p...
This thesis presents the notion of computational resiliency to provide reliability in heterogeneous ...
types: Editorial CommentThe realization of high performance simulation necessitates sophisticated si...
types: Editorial CommentHigh performance simulation that supports sophisticated simulation experimen...
Large scale systems provide a powerful computing platform for solving large and complex scientific a...
In this paper we present research on improving the resilience of the execution of scientific softwar...
International audienceLarge-scale simulation and optimization are demanding applications that requir...
Mitigating the risks of extreme natural hazards, such as hurricanes and earthquakes, triggers intric...
Projections and reports about exascale failure modes conclude that we need to protect numerical simu...
Projections and reports about exascale failure modes conclude that we need to protect numerical simu...
In this report we present a thorough study of the concept of resiliency in distributed workflow syst...
Large scale simulations are used in a variety of application areas in science and engineering to hel...
This report presents an approach to design, implement and deploy resilient distributed workflows. It...
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale...
International audienceWorkflows systems are considered here to support large-scale multiphysics simu...
International audienceThis paper presents an approach to design, implement and deploy a simulation p...
This thesis presents the notion of computational resiliency to provide reliability in heterogeneous ...
types: Editorial CommentThe realization of high performance simulation necessitates sophisticated si...
types: Editorial CommentHigh performance simulation that supports sophisticated simulation experimen...
Large scale systems provide a powerful computing platform for solving large and complex scientific a...
In this paper we present research on improving the resilience of the execution of scientific softwar...
International audienceLarge-scale simulation and optimization are demanding applications that requir...
Mitigating the risks of extreme natural hazards, such as hurricanes and earthquakes, triggers intric...
Projections and reports about exascale failure modes conclude that we need to protect numerical simu...
Projections and reports about exascale failure modes conclude that we need to protect numerical simu...
In this report we present a thorough study of the concept of resiliency in distributed workflow syst...
Large scale simulations are used in a variety of application areas in science and engineering to hel...
This report presents an approach to design, implement and deploy resilient distributed workflows. It...