GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem - causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information ...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a ...
Recent studies on commercial hardware demonstrated that irregular GPU workloads could bottleneck on ...
GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary t...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
General-purpose Graphics Processing Units (GPGPUs) are an important class of architectures that offe...
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a bottleneck. C...
<p>The continued growth of the computational capability of throughput processors has made throughput...
Modern Graphics Processing Units (GPUs) provide much higher off-chip memory bandwidth than CPUs, but...
Abstract—With the SIMT execution model, GPUs can hide memory latency through massive multithreading ...
Pervasive use of GPUs across multiple disciplines is a result of continuous adaptation of the GPU a...
Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accele...
GPU heavily relies on massive multi-threading to achieve high throughput. The massive multi-threadin...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...
Current GPU computing models support a mixture of coherent and incoherent classes of memory operatio...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a ...
Recent studies on commercial hardware demonstrated that irregular GPU workloads could bottleneck on ...
GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary t...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
General-purpose Graphics Processing Units (GPGPUs) are an important class of architectures that offe...
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a bottleneck. C...
<p>The continued growth of the computational capability of throughput processors has made throughput...
Modern Graphics Processing Units (GPUs) provide much higher off-chip memory bandwidth than CPUs, but...
Abstract—With the SIMT execution model, GPUs can hide memory latency through massive multithreading ...
Pervasive use of GPUs across multiple disciplines is a result of continuous adaptation of the GPU a...
Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accele...
GPU heavily relies on massive multi-threading to achieve high throughput. The massive multi-threadin...
Massively parallel processing devices, like Graphics Processing Units (GPUs), have the ability to ac...
Current GPU computing models support a mixture of coherent and incoherent classes of memory operatio...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a ...
Recent studies on commercial hardware demonstrated that irregular GPU workloads could bottleneck on ...