GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information co...
As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, t...
Modern Graphics Processing Units (GPUs) provide much higher off-chip memory bandwidth than CPUs, but...
Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accele...
GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary t...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
General-purpose Graphics Processing Units (GPGPUs) are an important class of architectures that offe...
Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a ...
<p>The continued growth of the computational capability of throughput processors has made throughput...
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a bottleneck. C...
Current GPU computing models support a mixture of coherent and incoherent classes of memory operatio...
GPU heavily relies on massive multi-threading to achieve high throughput. The massive multi-threadin...
This paper presents the MEMPower power model. MEMPower is a detailed empirical power model for GPU m...
Pervasive use of GPUs across multiple disciplines is a result of continuous adaptation of the GPU a...
The usage of Graphics Processing Units (GPUs) as an application accelerator has become increasingly ...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, t...
Modern Graphics Processing Units (GPUs) provide much higher off-chip memory bandwidth than CPUs, but...
Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accele...
GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary t...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
General-purpose Graphics Processing Units (GPGPUs) are an important class of architectures that offe...
Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a ...
<p>The continued growth of the computational capability of throughput processors has made throughput...
As GPU's compute capabilities grow, their memory hierarchy increasingly becomes a bottleneck. C...
Current GPU computing models support a mixture of coherent and incoherent classes of memory operatio...
GPU heavily relies on massive multi-threading to achieve high throughput. The massive multi-threadin...
This paper presents the MEMPower power model. MEMPower is a detailed empirical power model for GPU m...
Pervasive use of GPUs across multiple disciplines is a result of continuous adaptation of the GPU a...
The usage of Graphics Processing Units (GPUs) as an application accelerator has become increasingly ...
abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have b...
As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, t...
Modern Graphics Processing Units (GPUs) provide much higher off-chip memory bandwidth than CPUs, but...
Traditionally, GPUs only had programmer-managed caches. The advent of hardware-managed caches accele...