Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) currently uses the FIFO policy to schedule thread blocks of concurrent kernels. We show that the FIFO policy leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive Shortest Remaining Time First (SRTF) policy instead. Al-though SRTF requires an estimate of runtime of GPU kernels, we show that such an estimate of the runtime can be easily obtained using online profiling and exploiting a simple ob-servation on GPU kernels ’ grid structure. Specifically, we propose a novel Structural Runtime Predictor. Usin...
Concurrent kernel execution is a relatively new feature in modern GPUs, which was designed to improv...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...
The objective of this thesis is the development, implementation and optimization of a GPU execution ...
Recent NVIDIA Graphics Processing Units (GPUs) can ex-ecute multiple kernels concurrently. On these ...
Abstract-Graphic Processing Units (GPUs) achieve latency tolerance by exploiting massive amounts of ...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
The current trend in recently released Graphic Processing Units (GPUs) is to exploit transistor scal...
International audienceModern GPUs allow concurrent kernel execution and preemption to improve hardwa...
As the complexity of applications continues to grow, each new generation of GPUs has been equipped w...
GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments f...
Nowadays GPU clusters are available in almost every data processing center. Their GPUs are typically...
Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy...
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitaskin...
Abstract—Graphics processors, or GPUs, have recently been widely used as accelerators in shared envi...
Thread coarsening on GPUs combines the work of several threads into one. We show how thread coarseni...
Concurrent kernel execution is a relatively new feature in modern GPUs, which was designed to improv...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...
The objective of this thesis is the development, implementation and optimization of a GPU execution ...
Recent NVIDIA Graphics Processing Units (GPUs) can ex-ecute multiple kernels concurrently. On these ...
Abstract-Graphic Processing Units (GPUs) achieve latency tolerance by exploiting massive amounts of ...
As modern GPU workloads become larger and more complex, there is an ever-increasing demand for GPU c...
The current trend in recently released Graphic Processing Units (GPUs) is to exploit transistor scal...
International audienceModern GPUs allow concurrent kernel execution and preemption to improve hardwa...
As the complexity of applications continues to grow, each new generation of GPUs has been equipped w...
GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments f...
Nowadays GPU clusters are available in almost every data processing center. Their GPUs are typically...
Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy...
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitaskin...
Abstract—Graphics processors, or GPUs, have recently been widely used as accelerators in shared envi...
Thread coarsening on GPUs combines the work of several threads into one. We show how thread coarseni...
Concurrent kernel execution is a relatively new feature in modern GPUs, which was designed to improv...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...
The objective of this thesis is the development, implementation and optimization of a GPU execution ...