The Graphics Processing Unit is designed to manipulate plenty of memory fast. To use its full capacity, a deeper understanding of the underlying architecture is required. This thesis presents a simple but still flexible Copy API to move N-dimensional data fragments between memory spaces in a GPU efficiently. We introduce different strategies to divide fine-grained parallelism over a user given workload. These strategies are then benchmarked to show their possible performance variety. In a last step, we display the use of the Copy API on different algebraic applications, highlighting the advantages of access to simple and flexible data movement functions
Traditionally graphics clusters have been employed in real-time visualization of large geometric mod...
Recently General-Purpose Computing on Graphics Process-ing Units (GPGPU) has been used to reduce the...
Abstract—There has been a growing trend in using heteroge-neous systems with CPUs and GPUs to solve ...
We present efficient implementations of two primitives for data mapping and distribution on the mass...
We present efficient implementations of two primitives for data mapping and distribution on the mass...
Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications d...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
General-purpose computing on GPUs has become more accessible due to features such as shared virtual ...
This paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics pr...
The programming of GPUs (Graphics Processing Units) is ready for practical applications; the largest...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Over the last decade, graphics processing units (GPUs) have seen their use broaden from purely graph...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
In this paper, we analyze the special requirements of a dynamic memory allocator that is designed fo...
Abstract—Graphics processing units (GPUs) embrace many-core compute devices where massively parallel...
Traditionally graphics clusters have been employed in real-time visualization of large geometric mod...
Recently General-Purpose Computing on Graphics Process-ing Units (GPGPU) has been used to reduce the...
Abstract—There has been a growing trend in using heteroge-neous systems with CPUs and GPUs to solve ...
We present efficient implementations of two primitives for data mapping and distribution on the mass...
We present efficient implementations of two primitives for data mapping and distribution on the mass...
Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications d...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
General-purpose computing on GPUs has become more accessible due to features such as shared virtual ...
This paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics pr...
The programming of GPUs (Graphics Processing Units) is ready for practical applications; the largest...
Many applications with regular parallelism have been shown to benefit from using Graphics Processing...
Over the last decade, graphics processing units (GPUs) have seen their use broaden from purely graph...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
In this paper, we analyze the special requirements of a dynamic memory allocator that is designed fo...
Abstract—Graphics processing units (GPUs) embrace many-core compute devices where massively parallel...
Traditionally graphics clusters have been employed in real-time visualization of large geometric mod...
Recently General-Purpose Computing on Graphics Process-ing Units (GPGPU) has been used to reduce the...
Abstract—There has been a growing trend in using heteroge-neous systems with CPUs and GPUs to solve ...