Data preload for superscalar and VLIW processors

Chen, William Yu-Wei, Jr.

Abstract

Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased the average number of clock cycles per instruction. As a result, each execution cycle has become more significant to overall system performance. To maximize the effectiveness of each cycle, one must expose instruction-level parallelism and employ memory latency tolerant techniques. However, without special architecture support, a superscalar compiler cannot effectively accomplish these two tasks in the presence of control and memory access dependences.Preloading is a class of architectural support which allows memory reads to be performed early in spite of potential violation of control and memory access dependences. With preload support, a s...