This paper addresses the question of whether more accurate requested runtimes can significantly improve highperformance priority backfill policies, for production workloads running on leading edge systems such as the 1500 Origin 2000 system at NCSA or the new TeraGRID. This question has been studied previously for FCFS-backfill using a limited set of performance metrics. The new results for higher performance backfill policies, heavier system load, and for a broader range of performance metrics show that more accurate requested runtimes have much greater potential to improve system performance than suggested in previous results. Furthermore, the results show that (a) using user test runs to improve requested run time estimates can achieve m...
The multi-core era has led to a paradigm shift in the interaction between software and hardware. Mul...
When order release is applied, jobs are withheld in a backlog from where they are released to meet c...
Workload consolidation is a common method to increase resource utilization of the clusters or data c...
Abstract. The FCFS-based backfill algorithm is widely used in schedul-ing high-performance computer ...
The most commonly used scheduling algorithm for parallel super-computers is FCFS with backlling, as ...
Workflow schedulers often rely on task runtime estimates when making scheduling decisions, and they ...
Backfilling is a simple and effective way of improving the utilization of space-sharing schedulers. ...
Abstract. Job scheduling policies for HPC centers have been extensively stud-ied in the last few yea...
When order release is applied, jobs are withheld in a backlog from where they are released to meet c...
International audienceThe job management system is the HPC middleware responsible for distributing c...
System administrators for parallel computers face many difficulties when managing job scheduling sys...
The issue of under-estimated length of jobs (parallel applications) on backfill-based scheduling is ...
To effectively manage High-Performance Computing (HPC) resources, it is essential to maximize return...
This article focuses on the problem of dealing with low accuracy of job runtime estimates provided b...
Abstract—The complexity of modern computer systems may enable minor variations in performance evalua...
The multi-core era has led to a paradigm shift in the interaction between software and hardware. Mul...
When order release is applied, jobs are withheld in a backlog from where they are released to meet c...
Workload consolidation is a common method to increase resource utilization of the clusters or data c...
Abstract. The FCFS-based backfill algorithm is widely used in schedul-ing high-performance computer ...
The most commonly used scheduling algorithm for parallel super-computers is FCFS with backlling, as ...
Workflow schedulers often rely on task runtime estimates when making scheduling decisions, and they ...
Backfilling is a simple and effective way of improving the utilization of space-sharing schedulers. ...
Abstract. Job scheduling policies for HPC centers have been extensively stud-ied in the last few yea...
When order release is applied, jobs are withheld in a backlog from where they are released to meet c...
International audienceThe job management system is the HPC middleware responsible for distributing c...
System administrators for parallel computers face many difficulties when managing job scheduling sys...
The issue of under-estimated length of jobs (parallel applications) on backfill-based scheduling is ...
To effectively manage High-Performance Computing (HPC) resources, it is essential to maximize return...
This article focuses on the problem of dealing with low accuracy of job runtime estimates provided b...
Abstract—The complexity of modern computer systems may enable minor variations in performance evalua...
The multi-core era has led to a paradigm shift in the interaction between software and hardware. Mul...
When order release is applied, jobs are withheld in a backlog from where they are released to meet c...
Workload consolidation is a common method to increase resource utilization of the clusters or data c...