Recent NVIDIA Graphics Processing Units (GPUs) can ex-ecute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) currently uses the FIFO pol-icy to schedule thread blocks of concurrent kernels. We show that the FIFO policy leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive Shortest Remaining Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime of GPU kernels, we show that such an estimate of the run-time can be easily obtained using online profiling and ex-ploiting a simple observation on GPU kernels ’ grid struc-ture. Specifically, we propose a novel Structural Runtime Predictor. U...
Multiprocessor systems are increasingly becoming the sys- tems of choice for low and high-end server...
The objective of this thesis is the development, implementation and optimization of a GPU execution ...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these G...
International audienceModern GPUs allow concurrent kernel execution and preemption to improve hardwa...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
Abstract-Graphic Processing Units (GPUs) achieve latency tolerance by exploiting massive amounts of ...
The current trend in recently released Graphic Processing Units (GPUs) is to exploit transistor scal...
Nowadays GPU clusters are available in almost every data processing center. Their GPUs are typically...
Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy...
Thread coarsening on GPUs combines the work of several threads into one. We show how thread coarseni...
Concurrent kernel execution is a relatively new feature in modern GPUs, which was designed to improv...
As the complexity of applications continues to grow, each new generation of GPUs has been equipped w...
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitaskin...
GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments f...
Multiprocessor systems are increasingly becoming the sys- tems of choice for low and high-end server...
The objective of this thesis is the development, implementation and optimization of a GPU execution ...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these G...
International audienceModern GPUs allow concurrent kernel execution and preemption to improve hardwa...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
Abstract-Graphic Processing Units (GPUs) achieve latency tolerance by exploiting massive amounts of ...
The current trend in recently released Graphic Processing Units (GPUs) is to exploit transistor scal...
Nowadays GPU clusters are available in almost every data processing center. Their GPUs are typically...
Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy...
Thread coarsening on GPUs combines the work of several threads into one. We show how thread coarseni...
Concurrent kernel execution is a relatively new feature in modern GPUs, which was designed to improv...
As the complexity of applications continues to grow, each new generation of GPUs has been equipped w...
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitaskin...
GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments f...
Multiprocessor systems are increasingly becoming the sys- tems of choice for low and high-end server...
The objective of this thesis is the development, implementation and optimization of a GPU execution ...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...