For many numerical codes the transport of data from main memory to the registers is com-monly considered to be the main limiting factor to achieve high performance on present micro architectures. This fact is referred to as the memory wall. A lot of research is targeting this point on different levels. This covers for example code transformations and architecture aware data structures to achieve an optimal usage of the memory hierarchy found in all present micro architectures. This work shows that on modern micro architectures it is necessary to also take the requirements of the Single Instruction Multiple Data (SIMD) programming paradigm and data prefetching into account to reach high efficiencies. In this thesis the chain from high level ...
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computa...
This paper describes methods to adapt existing optimizing compilers for sequential languages to prod...
The memory-wall problem is a big challenge that classical Von Neumann-based computer systems face. T...
For many numerical codes the transport of data from main memory to the registers is commonly conside...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
In the last years, there has been much effort in commercial compilers (icc, gcc) to exploit efficien...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
The nominal peak speeds of both serial and parallel computers is raising rapidly. At the same time h...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
Processor clock frequencies and the related performance improvements recently stagnated due to sever...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computa...
This paper describes methods to adapt existing optimizing compilers for sequential languages to prod...
The memory-wall problem is a big challenge that classical Von Neumann-based computer systems face. T...
For many numerical codes the transport of data from main memory to the registers is commonly conside...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Abstract. Many current computer designs employ caches and a hierarchical memory architec-ture. The s...
. Many current computer designs employ caches and a hierarchical memory architecture. The speed of a...
In the last years, there has been much effort in commercial compilers (icc, gcc) to exploit efficien...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
The nominal peak speeds of both serial and parallel computers is raising rapidly. At the same time h...
We present a simple and novel framework for generating blocked codes for high-performance machines w...
Processor clock frequencies and the related performance improvements recently stagnated due to sever...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
SIMD architectures offer an alternative to MIMD architectures for obtaining high performance computa...
This paper describes methods to adapt existing optimizing compilers for sequential languages to prod...
The memory-wall problem is a big challenge that classical Von Neumann-based computer systems face. T...