The energy costs of data movement are limiting the performance scaling of future generations of high performance computing architectures targeted to data intensive applications. The result has been a resurgence in the interest in processing-in-memory (PIM) architectures. This challenge has spawned the development of a scalable, parametric data parallel architecture referred at the Heterogeneous Architecture Research Prototype (HARP) - a single instruction multiple thread (SIMT) architecture for integration into DRAM systems, particularly 3D memory stacks as a distinct processing layer to exploit the enormous internal memory bandwidth. However, this potential can only be realized with an optimizing compilation environment. This thesis addres...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
In this dissertation, we address the problem of runtime adaptation of the application to its executi...
Can today's most advanced compiler generation systems handle specialized parallel processor arc...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
This paper describes methods to adapt existing optimizing compilers for sequential languages to prod...
In this thesis we describe techniques for code generation and global optimization for a PRAM-NUMA mu...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
This document examines the effects of computational mode on the performance of parallel applications...
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by ...
Graphic processing units (GPUs) are composed of a group of single-instruction multiple data (SIMD) s...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-pred...
As an effective way of utilizing data parallelism in applications, SIMD architecture has been adopte...
We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPM...
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computa...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
In this dissertation, we address the problem of runtime adaptation of the application to its executi...
Can today's most advanced compiler generation systems handle specialized parallel processor arc...
Abstract—Data-parallel architectures must provide efficient support for complex control-flow constru...
This paper describes methods to adapt existing optimizing compilers for sequential languages to prod...
In this thesis we describe techniques for code generation and global optimization for a PRAM-NUMA mu...
International audienceGrowing interest in graphics processing units has brought renewed attention to...
This document examines the effects of computational mode on the performance of parallel applications...
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by ...
Graphic processing units (GPUs) are composed of a group of single-instruction multiple data (SIMD) s...
Power consumption and fabrication limitations are increasingly playing significant roles in the desi...
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-pred...
As an effective way of utilizing data parallelism in applications, SIMD architecture has been adopte...
We propose a compiler analysis pass for programs expressed in the Single Program, Multiple Data (SPM...
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computa...
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-...
In this dissertation, we address the problem of runtime adaptation of the application to its executi...
Can today's most advanced compiler generation systems handle specialized parallel processor arc...