For years, single-thread performance was the most dominant force driving processor development. In recent years, however, the poor scaling of single-thread super-scalar performance and power concerns coupled with the ever-increasing number of transistors available on chip has changed the focus from single-thread performance to thread-level parallelism running on multi-core designs. The trend is for these cores to be narrower with smaller windows. This dissertation addresses the question of how to maintain—and, ideally, improve—single-thread performance under such constraints. Mini-graph processing is a form of instruction fusion—the grouping of multiple operations into a single processing unit—that increases the instruction-per-cycle (IPC) ...
Nowadays, we are reaching a point where further improving single thread performance can only be done...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
This thesis is concerned with hardware approaches for maximizing the number of independent instructi...
For years, single-thread performance was the most dominant force driving processor development. In r...
A mini-graph is a dataflow graph that has an arbitrary internal size and shape but the interface of ...
Recently proposed techniques like mini-graphs, CCA-subgraphs, and static strands exploit application...
Several manufacturers have recently announced the first simultaneous-multithreaded processors, both ...
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage ...
Modern-day graph workloads operate on huge graphs through pointer chasing which leads to high last-l...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Operating Systems have been considered as a cor-nerstone of the modern computer system, and the con-...
Modern CMPs are designed to exploit both instruction-level parallelism within processors and threadl...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
The end of Dennard scaling also brought an end to frequency scaling as a means to improve performanc...
Nowadays, we are reaching a point where further improving single thread performance can only be done...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
This thesis is concerned with hardware approaches for maximizing the number of independent instructi...
For years, single-thread performance was the most dominant force driving processor development. In r...
A mini-graph is a dataflow graph that has an arbitrary internal size and shape but the interface of ...
Recently proposed techniques like mini-graphs, CCA-subgraphs, and static strands exploit application...
Several manufacturers have recently announced the first simultaneous-multithreaded processors, both ...
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage ...
Modern-day graph workloads operate on huge graphs through pointer chasing which leads to high last-l...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Operating Systems have been considered as a cor-nerstone of the modern computer system, and the con-...
Modern CMPs are designed to exploit both instruction-level parallelism within processors and threadl...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
The end of Dennard scaling also brought an end to frequency scaling as a means to improve performanc...
Nowadays, we are reaching a point where further improving single thread performance can only be done...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
This thesis is concerned with hardware approaches for maximizing the number of independent instructi...