International audienceTo keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking structure cause severe performance loss, even for very simple regular codes. We also show that grouping the memory operations in a pseudo-vectorized way enables the compiler to generate more effective code for the Itanium 2 processor. The impact of this code optimization technique on register pressure is analyzed for various vectori...
This work examines the interaction of compiler scheduling techniques with processor features such as...
We present a novel, compile-time method for determining the cache performance of the loop nests in a...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
International audienceTo keep up with a large degree of ILP, Itanium2 L2 cache system uses a complex...
International audienceMemory disambiguation mechanisms, coupled with load/store queues in out-of-ord...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
Effective global instruction scheduling techniques have become an important component in modern comp...
In this paper, we use execution-driven simulation to study and compare vector processing performance...
Journal PaperCurrent microprocessors incorporate techniques to exploit instruction-level parallelism...
The processor speeds continue to improve at a faster rate than the memory access times. The issue of...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This work examines the interaction of compiler scheduling techniques with processor features such as...
We present a novel, compile-time method for determining the cache performance of the loop nests in a...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
International audienceTo keep up with a large degree of ILP, Itanium2 L2 cache system uses a complex...
International audienceMemory disambiguation mechanisms, coupled with load/store queues in out-of-ord...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
In order to mitigate the impact of the constantly widening gap between processor speed and main memo...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
The trend in high-performance microprocessor design is toward increasing computational power on the ...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
Effective global instruction scheduling techniques have become an important component in modern comp...
In this paper, we use execution-driven simulation to study and compare vector processing performance...
Journal PaperCurrent microprocessors incorporate techniques to exploit instruction-level parallelism...
The processor speeds continue to improve at a faster rate than the memory access times. The issue of...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This work examines the interaction of compiler scheduling techniques with processor features such as...
We present a novel, compile-time method for determining the cache performance of the loop nests in a...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...