Many simulations in the physical sciences are expressed in terms of rectilinear arrays of variables. It is attractive to develop such simulations for use in 1-, 2-, 3- or arbitrary physical dimensions and also in a manner that supports exploitation of data-parallelism on fast modern processing devices. We report on data layouts and transformation algorithms that support both conventional and data-parallel memory layouts. We present our implementations expressed in both conventional serial C code as well as in NVIDIA's Compute Unified Device Architecture concurrent programming language for use on general purpose graphical processing units. We discuss: general memory layouts; specific optimizations possible for dimensions that are powers-of-t...
Using two full applications with different characteristics, this thesis explores the performance and...
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine ...
The objective of this thesis is to optimize the Seam Carving method in CUDA (Compute Unified Device ...
Many simulations in the physical sciences are expressed in terms of rectilinear arrays of variables....
Many simulations in the physical sciences are expressed in terms of rectilinear arrays of variables....
Matrix transposition is an important algorithmic building block for many numeric algorithms like m...
We present four CUDA based parallel implementations of the Space-Saving algorithm for determining fr...
The future of computation is the GPU, i.e. the Graphical Processing Unit. The graphics cards have sh...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
Data-parallel accelerator devices such as Graphical Processing Units (GPUs) are providing dramatic p...
Graphical processing units (GPUs) have recently attracted attention for scientific applications such...
International audienceScientific and engineering computing requires operation on flooded amount of d...
The need to analyze large amounts of multivariate data raises the fundamental problem of dimen-siona...
To design objects or entire scenes in real time with free-form surfaces makes a shifting process to ...
International audienceStochastic simulations involve multiple replications in order to build confide...
Using two full applications with different characteristics, this thesis explores the performance and...
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine ...
The objective of this thesis is to optimize the Seam Carving method in CUDA (Compute Unified Device ...
Many simulations in the physical sciences are expressed in terms of rectilinear arrays of variables....
Many simulations in the physical sciences are expressed in terms of rectilinear arrays of variables....
Matrix transposition is an important algorithmic building block for many numeric algorithms like m...
We present four CUDA based parallel implementations of the Space-Saving algorithm for determining fr...
The future of computation is the GPU, i.e. the Graphical Processing Unit. The graphics cards have sh...
Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their progra...
Data-parallel accelerator devices such as Graphical Processing Units (GPUs) are providing dramatic p...
Graphical processing units (GPUs) have recently attracted attention for scientific applications such...
International audienceScientific and engineering computing requires operation on flooded amount of d...
The need to analyze large amounts of multivariate data raises the fundamental problem of dimen-siona...
To design objects or entire scenes in real time with free-form surfaces makes a shifting process to ...
International audienceStochastic simulations involve multiple replications in order to build confide...
Using two full applications with different characteristics, this thesis explores the performance and...
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine ...
The objective of this thesis is to optimize the Seam Carving method in CUDA (Compute Unified Device ...