Summarization: We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operations, or the computation slice that can be used to calculate forthcoming memory references. This is in contrast to outcome-based predictors that exploit regularities in the (address) outcome stream. Slice processors are a generalization of existing operation-based prefetching mechanisms such as stream buffers where the operation itself is fixed in the design (e.g., address + stride). A slice processor dynamically identifies frequently missing loads and extracts on-the-fly the relevant address computation slices. Such slices are then executed in-parallel...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching ...
We study the dynamic stream of slices that lead to branches that foil an existing branch predictor a...
Pre-execution uses helper threads running in spare hardware contexts to trigger cache misses in fron...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
A relativeA, small set of static instructions has significant leverage on program execution performa...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
For many applications, branch mispredictions and cache misses limit a processor’s performance to a l...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
We consider extensible processor designs in which the number of gates and the distance that a signal...
Abstract—Data prefetching of regular access patterns is an effective mechanism to hide the memory la...
Prior work in hardware prefetching has focused mostly on either predicting regular streams with unif...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching ...
We study the dynamic stream of slices that lead to branches that foil an existing branch predictor a...
Pre-execution uses helper threads running in spare hardware contexts to trigger cache misses in fron...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
A relativeA, small set of static instructions has significant leverage on program execution performa...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
For many applications, branch mispredictions and cache misses limit a processor’s performance to a l...
An effective method for reducing the effect of load latency in modern processors is data prefetching...
The continually increasing speed of microprocessors stresses the need for ever faster instruction fe...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
We consider extensible processor designs in which the number of gates and the distance that a signal...
Abstract—Data prefetching of regular access patterns is an effective mechanism to hide the memory la...
Prior work in hardware prefetching has focused mostly on either predicting regular streams with unif...
Modern processors rely heavily on speculation to provide performance. Techniques such as branch pred...
CPU speeds double approximately every eighteen months, while main memory speeds double only about ev...
An important technique for alleviating the memory bottleneck is data prefetching. Data prefetching ...