Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performance. A key component of our enhancements is the introduction of several GPU-amenable strategies for assessing compute work. These strategies are implemented and benchmarked to find the most optimal data collection methodology for in-situ assessment of GPU compute work. For the fully kinetic particle-in-cell code WarpX, which supports MPI+CUDA parall...
AbstractWe present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) c...
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work pro...
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. Howe...
Maintaining computational load balance is important to the performant behavior of codes which operat...
The computational power provided by many-core graph-ics processing units (GPUs) has been exploited i...
We present 'jasmine', an implementation of a fully relativistic, 3D, electromagnetic Particle-In-Cel...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
In recent years the power wall has prevented the continued scaling of single core performance. This ...
Heterogeneous computing systems using one or more graphics processing units (GPUs) as accelerators p...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
Power-performance efficiency has become a central focus that is challenging in heterogeneous process...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
Using two full applications with different characteristics, this thesis explores the performance and...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
Graphic processors are becoming faster and faster. Computational power within graphic processing uni...
AbstractWe present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) c...
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work pro...
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. Howe...
Maintaining computational load balance is important to the performant behavior of codes which operat...
The computational power provided by many-core graph-ics processing units (GPUs) has been exploited i...
We present 'jasmine', an implementation of a fully relativistic, 3D, electromagnetic Particle-In-Cel...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
In recent years the power wall has prevented the continued scaling of single core performance. This ...
Heterogeneous computing systems using one or more graphics processing units (GPUs) as accelerators p...
Graphics Processing Units (GPU) have been widely adopted to accelerate the execution of HPC workload...
Power-performance efficiency has become a central focus that is challenging in heterogeneous process...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
Using two full applications with different characteristics, this thesis explores the performance and...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
Graphic processors are becoming faster and faster. Computational power within graphic processing uni...
AbstractWe present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) c...
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work pro...
Using multi-GPU systems, including GPU clusters, is gaining popularity in scientific computing. Howe...