International audienceFuture extreme-scale supercomputers will gather hundreds of million cores. The main problem that we address is energy consumption since such systems will consume enormous amount of energy. Besides that, we also need to overcome important challenges related to fault tolerance in such extreme-scale systems. Fault-tolerance protocols have different energy consumption depending on parameters like the platform characteristics, the application features and the number of processes used in the execution. Currently, in order to evaluate the power consumption of fault tolerant protocols in an given execution context, the only approach is to run the application with the different versions of fault tolerant protocols and monitor t...
This thesis deals with two issues for future Exascale platforms, namely resilience and energy. We ad...
This thesis deals with two issues for future Exascale platforms, namely resilience and energy. We ad...
Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) enviro...
International audienceFuture extreme-scale supercomputers will gather hundreds of million cores. The...
International audienceFuture extreme-scale supercomputers will gather hundreds of million cores. The...
Inquiring about different ways to reduce energy consumption during the execution of large-scale appl...
The fault tolerance method most used today in high-performance computing (HPC) is coordinated checkp...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
AbstractCheckpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC...
Extreme-scale computing is set to provide the infrastructure for the advances and breakthroughs that...
This thesis deals with two issues for future Exascale platforms, namely resilience and energy. We ad...
This thesis deals with two issues for future Exascale platforms, namely resilience and energy. We ad...
Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) enviro...
International audienceFuture extreme-scale supercomputers will gather hundreds of million cores. The...
International audienceFuture extreme-scale supercomputers will gather hundreds of million cores. The...
Inquiring about different ways to reduce energy consumption during the execution of large-scale appl...
The fault tolerance method most used today in high-performance computing (HPC) is coordinated checkp...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
International audienceEnergy consumption and fault tolerance are two interrelated issues to address ...
AbstractCheckpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC...
Extreme-scale computing is set to provide the infrastructure for the advances and breakthroughs that...
This thesis deals with two issues for future Exascale platforms, namely resilience and energy. We ad...
This thesis deals with two issues for future Exascale platforms, namely resilience and energy. We ad...
Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) enviro...