Abstract. Although automated empirical performance optimization and tuning is well-studied for kernels and domain-specific libraries, a current research grand challenge is how to extend these methodologies and tools to significantly larger sequential and parallel applications. In this con-text, we present the ROSE source-to-source outliner, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. Our outliner aims to handle large scale C/C++, Fortran and OpenMP applications. A set of program analysis and transformation techniques are utilized to enhance the porta-bility, scalability, and interope...
In this dissertation, we show that source-to-source optimization is an efficient method to generate ...
The architecture diversity of many-core processors - with their different types of cores, and memory...
In high-performance computing, excellent node-level performance is required for the efficient use of...
ROSE represents a programmable preprocessor for the highly aggressive optimization of C++ object-ori...
The excessive complexity of both machine architectures and applications have made it difficult for c...
Compile-time optimizations generally improve program performance. Nevertheless, degradations caused ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
AbstractThe complexity of modern architectures require compilers to apply an increasingly large coll...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Compile-time optimizations generally improve program performance. Nevertheless, degradations caused ...
This paper presents an automated performance tuning solution, which partitions a program into a numb...
Abstract. In many cases, simple analytical models used by traditional compilers are no longer able t...
146 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.My work discusses various str...
Application codes reliably under perform the advertised performance of existing architectures, compi...
In this dissertation, we show that source-to-source optimization is an efficient method to generate ...
The architecture diversity of many-core processors - with their different types of cores, and memory...
In high-performance computing, excellent node-level performance is required for the efficient use of...
ROSE represents a programmable preprocessor for the highly aggressive optimization of C++ object-ori...
The excessive complexity of both machine architectures and applications have made it difficult for c...
Compile-time optimizations generally improve program performance. Nevertheless, degradations caused ...
Graphics Processing Units (GPUs) have revolutionized the computing landscape in the past decade and ...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
AbstractThe complexity of modern architectures require compilers to apply an increasingly large coll...
Graphics Processing Units (GPUs) have revolutionized the HPC landscape. The first generation of exas...
Compile-time optimizations generally improve program performance. Nevertheless, degradations caused ...
This paper presents an automated performance tuning solution, which partitions a program into a numb...
Abstract. In many cases, simple analytical models used by traditional compilers are no longer able t...
146 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.My work discusses various str...
Application codes reliably under perform the advertised performance of existing architectures, compi...
In this dissertation, we show that source-to-source optimization is an efficient method to generate ...
The architecture diversity of many-core processors - with their different types of cores, and memory...
In high-performance computing, excellent node-level performance is required for the efficient use of...