The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl′s Law. Reported studies thus far of instruction-level parallelism have mixed data-parallel program portions with scalar program portions, often leading to contradictory and controversial results. We report an instruction-level behavioral characterization of scalar code containing minimal data-parallelism, extracted from highly vectorized programs of the PERFECT benchmark suite running on a Cray Y-MP system. We classify scalar basic blocks according to their instruction mix, characterize the data dependencies seen in each class, and, as a first step, measure the maximum intrablock instruction-level parallelism available. We observ...
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled ...
Scability studies of parallel architectures have used scalar metrics to evaluate their performan...
Abstract—A new breed of processors like the Cell Broadband Engine, the Imagine stream processor and ...
The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed o...
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
This report examines ultra-fine grain machine parallelism determined by various hardware styles and ...
This report presents a new architecture based on addding a vector pipeline to a superscalar micropro...
In this paper the performance of multiple-instructionissue processors with variable register file si...
An emerging trend in processor design is the incorporation of short vector instructions into the ISA...
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar pr...
Simultaneous multithreaded vector architectures combine the best of data-level and instruction-level...
What is Instruction-Level Parallelism? --Scalar Operation --Loops --Pipelining --Loop Performanc...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
This paper presents a framework for characterizing the distribution of fine-grained parallelism, dat...
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled ...
Scability studies of parallel architectures have used scalar metrics to evaluate their performan...
Abstract—A new breed of processors like the Cell Broadband Engine, the Imagine stream processor and ...
The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed o...
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data...
An emerging trend in processor design is the addition of short vector instructions to general-purpos...
This report examines ultra-fine grain machine parallelism determined by various hardware styles and ...
This report presents a new architecture based on addding a vector pipeline to a superscalar micropro...
In this paper the performance of multiple-instructionissue processors with variable register file si...
An emerging trend in processor design is the incorporation of short vector instructions into the ISA...
In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar pr...
Simultaneous multithreaded vector architectures combine the best of data-level and instruction-level...
What is Instruction-Level Parallelism? --Scalar Operation --Loops --Pipelining --Loop Performanc...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
This paper presents a framework for characterizing the distribution of fine-grained parallelism, dat...
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled ...
Scability studies of parallel architectures have used scalar metrics to evaluate their performan...
Abstract—A new breed of processors like the Cell Broadband Engine, the Imagine stream processor and ...