Abstract. Current media ISA extensions such as Sun’s VIS consist of SIMD-like instructions that operate on short vector registers. In order to exploit more parallelism in a superscalar processor provided with such instructions, the issue width has to be increased. In the ComplexStreamed Instruction (CSI) set exploiting more parallelism does not involve issuing more instructions. In this paper we study how the performance of superscalar processors extended with CSI or VIS scales with the amount of parallel execution hardware. Results show that the performance of the CSI-enhanced processor scales very well. For example, increasing the datapath width of the CSI execution unit from 16 to 32 bytes improves the kernel-level performance by a facto...
textGeneral-purpose processors (GPPs) have been augmented with multimedia extensions to improve per...
The growing interest that multimedia processing has experimented during the last decade is motivatin...
To maintain a reasonable level of complexity, processor implementations contain Serializing Instruct...
Abstract—An instruction set extension designed to accelerate multimedia applications is presented an...
In this paper, we examine the impact of instruction level parallelism (ILP) on the full H.264 video ...
Multimedia applications are compute intensive applications that often contain multiple streams of o...
The effective performance of wide-issue superscalar processors depends on many parameters, such as b...
Many important multimedia applications contain a significant fraction of reduction operations. Altho...
This paper aims to provide a quantitative understanding of the performance of image and video proces...
Abstract—Current SIMD extensions have probed to be effective for incrementing the performance of gen...
There is a huge variety of processor microarchitectural techniques to decrease the program execution...
In pursuit of ever increasing performance, more and more processor architectures have become multico...
Abstract — The efficient processing of MultiMedia Applications (MMAs) is currently one of the main b...
Very Long Instruction Word (VLIW) architectures exploit instruction level parallelism (ILP) with the...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
textGeneral-purpose processors (GPPs) have been augmented with multimedia extensions to improve per...
The growing interest that multimedia processing has experimented during the last decade is motivatin...
To maintain a reasonable level of complexity, processor implementations contain Serializing Instruct...
Abstract—An instruction set extension designed to accelerate multimedia applications is presented an...
In this paper, we examine the impact of instruction level parallelism (ILP) on the full H.264 video ...
Multimedia applications are compute intensive applications that often contain multiple streams of o...
The effective performance of wide-issue superscalar processors depends on many parameters, such as b...
Many important multimedia applications contain a significant fraction of reduction operations. Altho...
This paper aims to provide a quantitative understanding of the performance of image and video proces...
Abstract—Current SIMD extensions have probed to be effective for incrementing the performance of gen...
There is a huge variety of processor microarchitectural techniques to decrease the program execution...
In pursuit of ever increasing performance, more and more processor architectures have become multico...
Abstract — The efficient processing of MultiMedia Applications (MMAs) is currently one of the main b...
Very Long Instruction Word (VLIW) architectures exploit instruction level parallelism (ILP) with the...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
textGeneral-purpose processors (GPPs) have been augmented with multimedia extensions to improve per...
The growing interest that multimedia processing has experimented during the last decade is motivatin...
To maintain a reasonable level of complexity, processor implementations contain Serializing Instruct...