Torus-connected network is widely used in modern supercomputers due to its linear per node cost scaling and its competitive overall performance. Job scheduling system plays a critical role for the efficient use of supercomputers. As supercomputers continue growing in size, a fundamental problem arises: how to effectively balance job performance with system performance on torus-connected machines? In this work, we will present a new scheduling design named window-based locality-aware scheduling. Our design contains three novel features. First, rather than one-by-one job scheduling, our design takes a “window” of jobs, i.e. multiple jobs, into consideration for job prioritizing and resource allocation. Second, our design maintains a list of s...
This paper analyzes job scheduling for parallel computers by using theoretical and experimental mean...
grantor: University of TorontoMultiprocessors are being used increasingly to support workl...
Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with grow...
In this paper we investigate the problem of how to schedule n independent jobs on an m \Theta m toru...
Network interference of nearby jobs has been recently identified as the dominant reason for the high...
Abstract. This paper studies the influence that job placement may have on scheduling performance, in...
Resource management and job scheduling is a crucial task on large-scale computing systems. Despite y...
Abstract—Torus-based networks are prevalent on leadership-class petascale systems, providing a good ...
scheduling In this paper, we utilize a bandwidth-centric job communication model that captures the i...
Metacomputing is a convenient and powerful abstraction for dealing with the complexities that arise ...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Abstract. Recent success in building petascale computing systems poses new challenges in job schedul...
Metacomputing is a convenient and powerful abstraction for dealing with the complexities that arise ...
Abstract—this paper studies the influence that task placement may have on the performance of applica...
Abstract—As systems scale toward exascale, many resources will become increasingly constrained. Whil...
This paper analyzes job scheduling for parallel computers by using theoretical and experimental mean...
grantor: University of TorontoMultiprocessors are being used increasingly to support workl...
Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with grow...
In this paper we investigate the problem of how to schedule n independent jobs on an m \Theta m toru...
Network interference of nearby jobs has been recently identified as the dominant reason for the high...
Abstract. This paper studies the influence that job placement may have on scheduling performance, in...
Resource management and job scheduling is a crucial task on large-scale computing systems. Despite y...
Abstract—Torus-based networks are prevalent on leadership-class petascale systems, providing a good ...
scheduling In this paper, we utilize a bandwidth-centric job communication model that captures the i...
Metacomputing is a convenient and powerful abstraction for dealing with the complexities that arise ...
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce...
Abstract. Recent success in building petascale computing systems poses new challenges in job schedul...
Metacomputing is a convenient and powerful abstraction for dealing with the complexities that arise ...
Abstract—this paper studies the influence that task placement may have on the performance of applica...
Abstract—As systems scale toward exascale, many resources will become increasingly constrained. Whil...
This paper analyzes job scheduling for parallel computers by using theoretical and experimental mean...
grantor: University of TorontoMultiprocessors are being used increasingly to support workl...
Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with grow...