International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-performance computing (HPC) platforms to minimize the overall completion time, or makespan. We revisit the classical problem while assuming that jobs are subject to transient or silent errors, and hence may need to be re-executed each time they fail to complete successfully. This work generalizes the classical framework where jobs are known offline and do not fail: in the classical framework, list scheduling that gives priority to longest jobs is known to be a 3-approximation when imposing to use shelves, and a 2-approximation without this restriction. We show that when jobs can fail, using shelves can be arbitrarily bad, but unrestricted list sche...
The application of computers in safety-critical systems is expanding rapidly. With reliability speci...
Imprecise computation and parallel processing are two techniques for avoiding timing faults and tole...
International audienceHeterogeneous distributed systems are widely deployed for executing computatio...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
International audienceWe study the resilient scheduling of moldable parallel jobs on high-performanc...
International audienceThis paper focuses on the resilient scheduling of moldable parallel jobs on hi...
International audienceScheduling in High-Performance Computing (HPC) has been traditionally centered...
International audienceThe optimization of parallel applications is difficult to achieve by classical...
International audienceApplications implemented on critical systems are subject to both safety critic...
(eng) Abstract Most list scheduling heuristics rely on a simple platform model where communication c...
textabstractWhen jobs have to be processed on a set of identical parallel machines so as to minimize...
Scheduling deteriorating jobs on parallel machines is an NP-hard problem, for which heuristics would...
When jobs have to be processed on a set of identical parallel machines so as to minimize the makespa...
In this report we describe a greedy algorithm to schedule parallel jobs that consist of independent,...
The application of computers in safety-critical systems is expanding rapidly. With reliability speci...
Imprecise computation and parallel processing are two techniques for avoiding timing faults and tole...
International audienceHeterogeneous distributed systems are widely deployed for executing computatio...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
International audienceWe study the resilient scheduling of moldable parallel jobs on high-performanc...
International audienceThis paper focuses on the resilient scheduling of moldable parallel jobs on hi...
International audienceScheduling in High-Performance Computing (HPC) has been traditionally centered...
International audienceThe optimization of parallel applications is difficult to achieve by classical...
International audienceApplications implemented on critical systems are subject to both safety critic...
(eng) Abstract Most list scheduling heuristics rely on a simple platform model where communication c...
textabstractWhen jobs have to be processed on a set of identical parallel machines so as to minimize...
Scheduling deteriorating jobs on parallel machines is an NP-hard problem, for which heuristics would...
When jobs have to be processed on a set of identical parallel machines so as to minimize the makespa...
In this report we describe a greedy algorithm to schedule parallel jobs that consist of independent,...
The application of computers in safety-critical systems is expanding rapidly. With reliability speci...
Imprecise computation and parallel processing are two techniques for avoiding timing faults and tole...
International audienceHeterogeneous distributed systems are widely deployed for executing computatio...