A novel comprehensive and coherent approach for the purpose of increasing instruction-level parallelism (ILP) is devised. The key new tool in our envisioned system update is the addition of a parallel prefix-sum (PS) instruction, which will have efficient implementation in hardware, to the instruction-set architecture. This addition gives for the first time a concrete way for recruiting the whole knowledge base of parallel algorithms for that purpose. The potential increase in ILP is demonstrated by experimental results for a test application. The main technical contribution is in the form of a "completeness theorem". Perhaps surprisingly, the current abstract proves that in an envisioned system which employs parallel PS functiona...
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
Parallel prefix computation is perhaps the most frequently used subroutine in parallel algorithms to...
Abstract: "Experienced algorithm designers rely heavily on a set of building blocks and on the tools...
A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is prese...
Denotational semantics is usually extensional in that it deals only with input/output properties of ...
The work presents a new principle for microprocessor design based on a pairwise balanced combinatori...
This dissertation demonstrates that through the careful application of hardware and software techniq...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
Instruction Level Parallelism (ILP) is the number of instructions that can be executed in simultaneo...
Abstract—Customization of a (generic) processor to a partic-ular application makes it possible to ac...
Parallel prefix sums algorithms are one of the simplest and most useful building blocks for construc...
In this paper there are considered several approaches for the increasing performance of software imp...
The problem of parsing and compiling arithmetic expressions in parallel computational environments i...
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
Parallel prefix computation is perhaps the most frequently used subroutine in parallel algorithms to...
Abstract: "Experienced algorithm designers rely heavily on a set of building blocks and on the tools...
A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is prese...
Denotational semantics is usually extensional in that it deals only with input/output properties of ...
The work presents a new principle for microprocessor design based on a pairwise balanced combinatori...
This dissertation demonstrates that through the careful application of hardware and software techniq...
dataflow processors, superscalar processors, instruction scheduling, trace scheduling, software pipe...
has emphasized instruction-level parallelism, which improves performance by increasing the number of...
Instruction Level Parallelism (ILP) is the number of instructions that can be executed in simultaneo...
Abstract—Customization of a (generic) processor to a partic-ular application makes it possible to ac...
Parallel prefix sums algorithms are one of the simplest and most useful building blocks for construc...
In this paper there are considered several approaches for the increasing performance of software imp...
The problem of parsing and compiling arithmetic expressions in parallel computational environments i...
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained...
The increasing density of VLSI circuits has motivated research into ways to utilize large area budge...
Parallel prefix computation is perhaps the most frequently used subroutine in parallel algorithms to...