International audience—Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free context, both in terms of performance and energy savings. However, large-scale computer systems are confronted to frequent failures, and resilience techniques must be employed to ensure the completion of large applications. Indeed, failures may create severe imbalance between applications, and significantly degrade performance. In this paper, we propose to redistribute the resources assigned to each application upon the striking of failures, in order to minimize the expected completion time of a set of co-scheduled applications. First, we introduce a formal model and establish complexity results. When no redistribution ...
AbstractMost list scheduling heuristics rely on a simple platform model wherecommunication contentio...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
International audience—Recently, the benefits of co-scheduling several applications have been demons...
This thesis explores co-scheduling problems in the context of large-scale applications with two main...
International audienceThis paper investigates co-scheduling algorithms for processing a set of paral...
International audienceHigh performance computing applications must be resilient to faults. The tradi...
We study the scheduling of computational workflows on compute resources thatexperience exponentially...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
International audienceWe study the scheduling of computational workflows on compute resources that e...
Proc. of the 37th IEEE Intenational Conference on parallel Processing (ICPP 2008) IEEE Computer Soci...
This thesis consists of two parts: performance bounds for scheduling algorithms for parallel program...
International audienceProcessor failures in post-petascale parallel computing platforms are common o...
AbstractMost list scheduling heuristics rely on a simple platform model wherecommunication contentio...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
This thesis focuses on resilience for high performance applications that execute on large scale plat...
International audience—Recently, the benefits of co-scheduling several applications have been demons...
This thesis explores co-scheduling problems in the context of large-scale applications with two main...
International audienceThis paper investigates co-scheduling algorithms for processing a set of paral...
International audienceHigh performance computing applications must be resilient to faults. The tradi...
We study the scheduling of computational workflows on compute resources thatexperience exponentially...
Emerging architecture designs include tens of processing cores on a single chip die; it is believed ...
International audienceWe study the scheduling of computational workflows on compute resources that e...
Proc. of the 37th IEEE Intenational Conference on parallel Processing (ICPP 2008) IEEE Computer Soci...
This thesis consists of two parts: performance bounds for scheduling algorithms for parallel program...
International audienceProcessor failures in post-petascale parallel computing platforms are common o...
AbstractMost list scheduling heuristics rely on a simple platform model wherecommunication contentio...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
This thesis focuses on resilience for high performance applications that execute on large scale plat...