Pre-execution systems reduce the impact of cache misses and branch mispredictions by forking a slice, a code fragment derived from the program, in advance of frequently mispredicted branches and frequently missing loads in order to either resolve the branch or prefetch the load. Because unnecessary instructions are omitted the slice reaches the branch or load before the main thread does, for loads this time margin can reduce or even eliminate cache miss delay
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
Though current general-purpose processors have several small CPU cores as opposed to a single more c...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
A relativeA, small set of static instructions has significant leverage on program execution performa...
For many applications, branch mispredictions and cache misses limit a processor’s performance to a l...
Current trends in processor design are pointing to deeper and wider pipelines and superscalar archit...
Current trends in processor design are pointing to deeper and wider pipelines and superscalar archit...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
We study the dynamic stream of slices that lead to branches that foil an existing branch predictor a...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The speculated execution of threads in a multithreaded architecture plus the branch prediction used ...
Summarization: We describe the Slice Processor micro-architecture that implements a generalized oper...
Predicated execution has been used to reduce the number of branch mispredictions by eliminating hard...
Despite years of study, branch mispredictions remain as a significant performance impediment in pipe...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
Though current general-purpose processors have several small CPU cores as opposed to a single more c...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
A relativeA, small set of static instructions has significant leverage on program execution performa...
For many applications, branch mispredictions and cache misses limit a processor’s performance to a l...
Current trends in processor design are pointing to deeper and wider pipelines and superscalar archit...
Current trends in processor design are pointing to deeper and wider pipelines and superscalar archit...
Cache performance analysis is becoming increasingly important in microprocessor design. This work ex...
We study the dynamic stream of slices that lead to branches that foil an existing branch predictor a...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is i...
As the degree of instruction-level parallelism in superscalar architectures increases, the gap betwe...
The speculated execution of threads in a multithreaded architecture plus the branch prediction used ...
Summarization: We describe the Slice Processor micro-architecture that implements a generalized oper...
Predicated execution has been used to reduce the number of branch mispredictions by eliminating hard...
Despite years of study, branch mispredictions remain as a significant performance impediment in pipe...
Data prefetching effectively reduces the negative effects of long load latencies on the performance ...
Though current general-purpose processors have several small CPU cores as opposed to a single more c...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...