This paper focuses on the resilient scheduling of parallel jobs on highperformance computing (HPC) platforms to minimize the overall completion time, or makespan. We revisit the problem by assuming that jobs are subject to transient or silent errors, and hence may need to be re-executed each time they fail to complete successfully. This work generalizes the classical framework where jobs are known offline and do not fail: in this classical framework, list scheduling that gives priority to longest jobs is known to be a 3-approximation when imposing to use shelves, and a 2-approximation without this restriction. We show that when jobs can fail, using shelves can be arbitrarily bad, butunrestricted list scheduling remains a 2-approximation. Th...
(eng) Abstract Most list scheduling heuristics rely on a simple platform model where communication c...
AbstractWhen jobs have to be processed on a set of identical parallel machines so as to minimize the...
This thesis is focused on the two major problems in the high performance computing context: resilien...
This paper focuses on the resilient scheduling of parallel jobs on highperformance computing (HPC) p...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
International audienceWe study the resilient scheduling of moldable parallel jobs on high-performanc...
International audienceThis paper focuses on the resilient scheduling of moldable parallel jobs on hi...
International audienceApplications implemented on critical systems are subject to both safety critic...
textabstractWhen jobs have to be processed on a set of identical parallel machines so as to minimize...
International audienceScheduling in High-Performance Computing (HPC) has been traditionally centered...
When jobs have to be processed on a set of identical parallel machines so as to minimize the makespa...
Imprecise computation and parallel processing are two techniques for avoiding timing faults and tole...
Scheduling in High-Performance Computing (HPC) has been traditionally centered around computing reso...
Scheduling deteriorating jobs on parallel machines is an NP-hard problem, for which heuristics would...
(eng) Abstract Most list scheduling heuristics rely on a simple platform model where communication c...
AbstractWhen jobs have to be processed on a set of identical parallel machines so as to minimize the...
This thesis is focused on the two major problems in the high performance computing context: resilien...
This paper focuses on the resilient scheduling of parallel jobs on highperformance computing (HPC) p...
International audienceThis paper focuses on the resilient scheduling of parallel jobs on high-perfor...
International audienceWe study the resilient scheduling of moldable parallel jobs on high-performanc...
International audienceThis paper focuses on the resilient scheduling of moldable parallel jobs on hi...
International audienceApplications implemented on critical systems are subject to both safety critic...
textabstractWhen jobs have to be processed on a set of identical parallel machines so as to minimize...
International audienceScheduling in High-Performance Computing (HPC) has been traditionally centered...
When jobs have to be processed on a set of identical parallel machines so as to minimize the makespa...
Imprecise computation and parallel processing are two techniques for avoiding timing faults and tole...
Scheduling in High-Performance Computing (HPC) has been traditionally centered around computing reso...
Scheduling deteriorating jobs on parallel machines is an NP-hard problem, for which heuristics would...
(eng) Abstract Most list scheduling heuristics rely on a simple platform model where communication c...
AbstractWhen jobs have to be processed on a set of identical parallel machines so as to minimize the...
This thesis is focused on the two major problems in the high performance computing context: resilien...