A static memory reference exhibits a unique property when its dynamic memory addresses are congruent with respect to some non-trivial modulus. Extraction of this congruence information at compile-time enables new classes of program optimization. In this paper, we present methods for forcing congruence among the dynamic addresses of a memory reference. We also introduce a compiler algorithm for detecting this property. Our transformations do not require interprocedural analysis and introduce almost no overhead. As a result, they can be incorporated into real compilation systems. On average, our transformations are able to achieve a five-fold increase in the number of congruent memory operations. We are then able to detect 95% of these refere...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This thesis investigates compiler algorithms to transform program and data to utilize efficiently th...
To exploit instruction level parallelism, it is important not only to execute multiple memory refere...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
The increase in the latencies of memory operations can be attributed to the increasing disparity bet...
As memory system performance becomes an increasingly dominant factor in overall system performance, ...
We present the internal representation and optimizations used by the CASH compiler for improving the...
To expose sufficient instruction-level parallelism (ILP) to make effective use of wide-issue supersc...
Most memory references in numerical codes correspond to array references whose indices are affine fu...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
Program redundancy analysis and optimization have been an important component in optimizing compiler...
To expose sufficient instruction-level parallelism (ILP) to make effective use of wide-issue supersc...
Over the last several decades, two important shifts have taken place in the computing world: first, ...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This thesis investigates compiler algorithms to transform program and data to utilize efficiently th...
To exploit instruction level parallelism, it is important not only to execute multiple memory refere...
The performance of the memory hierarchy has become one of the most critical elements in the performa...
The increase in the latencies of memory operations can be attributed to the increasing disparity bet...
As memory system performance becomes an increasingly dominant factor in overall system performance, ...
We present the internal representation and optimizations used by the CASH compiler for improving the...
To expose sufficient instruction-level parallelism (ILP) to make effective use of wide-issue supersc...
Most memory references in numerical codes correspond to array references whose indices are affine fu...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
Over the past decade, microprocessor design strategies have focused on increasing the computational ...
Program redundancy analysis and optimization have been an important component in optimizing compiler...
To expose sufficient instruction-level parallelism (ILP) to make effective use of wide-issue supersc...
Over the last several decades, two important shifts have taken place in the computing world: first, ...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This thesis investigates compiler algorithms to transform program and data to utilize efficiently th...
To exploit instruction level parallelism, it is important not only to execute multiple memory refere...