Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free context, both in terms of performance and energy savings. However, large-scale computer systems are confronted to frequent failures, and resilience techniques must be employed to ensure the completion of large applications. Indeed, failures may create severe imbalance between applications, and significantly degrade performance. In this paper, we propose to redistribute the resources assigned to each application upon the striking of failures, in order to minimize the expected completion time of a set of co-scheduled applications. First we introduce a formal model and establish complexity results. When no redistribution is allowed, we can minim...
AbstractMost list scheduling heuristics rely on a simple platform model wherecommunication contentio...
Cette thèse s'intéresse à la résilience pour les applications haute performance à très grande échell...
International audienceHigh performance computing applications must be resilient to faults. The tradi...
Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free ...
This thesis explores co-scheduling problems in the context of large-scale applications with two main...
International audienceThis paper investigates co-scheduling algorithms for processing a set of paral...
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Inst...
We study the scheduling of computational workflows on compute resources thatexperience exponentially...
Proc. of the 37th IEEE Intenational Conference on parallel Processing (ICPP 2008) IEEE Computer Soci...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
Embedded systems account for a major part of crit- ical applications (space, aeronautics, nuclear. ....
This thesis is focused on the two major problems in the high performance computing context: resilien...
International audiencePlatforms that comprise volatile processors, such as desktop grids, have been ...
AbstractMost list scheduling heuristics rely on a simple platform model wherecommunication contentio...
Cette thèse s'intéresse à la résilience pour les applications haute performance à très grande échell...
International audienceHigh performance computing applications must be resilient to faults. The tradi...
Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free ...
This thesis explores co-scheduling problems in the context of large-scale applications with two main...
International audienceThis paper investigates co-scheduling algorithms for processing a set of paral...
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Inst...
We study the scheduling of computational workflows on compute resources thatexperience exponentially...
Proc. of the 37th IEEE Intenational Conference on parallel Processing (ICPP 2008) IEEE Computer Soci...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
Embedded systems account for a major part of crit- ical applications (space, aeronautics, nuclear. ....
This thesis is focused on the two major problems in the high performance computing context: resilien...
International audiencePlatforms that comprise volatile processors, such as desktop grids, have been ...
AbstractMost list scheduling heuristics rely on a simple platform model wherecommunication contentio...
Cette thèse s'intéresse à la résilience pour les applications haute performance à très grande échell...
International audienceHigh performance computing applications must be resilient to faults. The tradi...