A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a branch target instruction memory is described. An analytical model of the performance of this implementation makes it possible to measure the efficiency of the mechanism with a very low computational cost. The model is used to determine the size of cache lines that maximizes the processor performance, to compare the performance of the mechanism with that of other schemes, and to analyze the performance of the mechanism with two alternative cache organizations.Peer Rev...
In simultaneous multithreaded architectures many separate threads are running concurrently, sharing ...
The exponentially increasing gap between processors and off-chip memory, as measured in processor cy...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
The presence of branch instructions in an instruction stream may adversely affect the performance of...
Control dependencies are one of the major limitations to increase the performance of pipelined proce...
While delayed branch mechanisms were popular with the designers of RISC processors, most superscalar...
Pipelining is a major technique used in high performance processors. But a fundamental drawback of p...
As the gap between memory and processor performance continues to grow, more and more programs will ...
This paper formulates and shows how to solve the problem of selecting the cache size and depth of ca...
Due to the character of the original source materials and the nature of batch digitization, quality ...
This paper proposes a method of buffering instructions by software-based prefetching. The method all...
In superpipeline microarchitecture, the instruction execution cycle is divided into many stages. Thi...
As the issue width and depth of pipelining of high performance superscalar processors increase, the ...
In simultaneous multithreaded architectures many separate threads are running concurrently, sharing ...
The exponentially increasing gap between processors and off-chip memory, as measured in processor cy...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is...
Pipelined microprocessors allow the simultaneous execution of several machine instructions at a time...
The presence of branch instructions in an instruction stream may adversely affect the performance of...
Control dependencies are one of the major limitations to increase the performance of pipelined proce...
While delayed branch mechanisms were popular with the designers of RISC processors, most superscalar...
Pipelining is a major technique used in high performance processors. But a fundamental drawback of p...
As the gap between memory and processor performance continues to grow, more and more programs will ...
This paper formulates and shows how to solve the problem of selecting the cache size and depth of ca...
Due to the character of the original source materials and the nature of batch digitization, quality ...
This paper proposes a method of buffering instructions by software-based prefetching. The method all...
In superpipeline microarchitecture, the instruction execution cycle is divided into many stages. Thi...
As the issue width and depth of pipelining of high performance superscalar processors increase, the ...
In simultaneous multithreaded architectures many separate threads are running concurrently, sharing ...
The exponentially increasing gap between processors and off-chip memory, as measured in processor cy...
Achieving high instruction issue rates depends on the ability to dynamically predict branches. We co...