Performance characteristics of irregular programs on parallel architectures were studied. Results indicated significant overheads of thread divergence and register utilization on the GPU and sub-optimal thread migrations patterns on the EMU. Compiler and architecture optimizations addressing these inefficiencies were designed and implemented, and performance data were collected. These optimizations included instruction and thread scheduling, as well as resource allocation techniques. Findings showed the potential for significant performance improvements for irregular programs executing on the GPU or the EMU. Further analysis revealed both positive and negative implications of other compiler phases and program characteristics on the perform...
International audienceWith the introduction of multi-core processors, thread affinity has quickly ap...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
GPUs are an increasingly popular implementation platform for a variety of general purpose applicatio...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
In recent processor development, we have witnessed the in-tegration of GPU and CPUs into a single ch...
This paper presents super-threading, which generically means the architectural and software mechanis...
This work examines the interaction of compiler scheduling techniques with processor features such as...
This work examines the interaction of compiler scheduling techniques with processor features such as...
To help shrink the programmability-performance efficiency gap, we discuss that adaptive runtime syst...
Thread level parallelism of applications is commonly exploited using multi-thread processors. In suc...
This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined archit...
This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined archit...
International audienceWith the introduction of multi-core processors, thread affinity has quickly ap...
International audienceWith the introduction of multi-core processors, thread affinity has quickly ap...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
GPUs are an increasingly popular implementation platform for a variety of general purpose applicatio...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...
In recent processor development, we have witnessed the in-tegration of GPU and CPUs into a single ch...
This paper presents super-threading, which generically means the architectural and software mechanis...
This work examines the interaction of compiler scheduling techniques with processor features such as...
This work examines the interaction of compiler scheduling techniques with processor features such as...
To help shrink the programmability-performance efficiency gap, we discuss that adaptive runtime syst...
Thread level parallelism of applications is commonly exploited using multi-thread processors. In suc...
This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined archit...
This thesis investigates parallelism and hardware design trade-offs of parallel and pipelined archit...
International audienceWith the introduction of multi-core processors, thread affinity has quickly ap...
International audienceWith the introduction of multi-core processors, thread affinity has quickly ap...
In the past decade, accelerators, commonly Graphics Processing Units (GPUs), have played a key role ...
Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threa...