The traditional permutation multiplication algorithm is now limited by memory latency and not by CPU speed. A new cache-aware permutation algorithm speeds up permutation multiplication by a factor of 3.4 on current CPUs. The new algorithm is limited by memory bandwidth, but not by memory latency. Current trends indicate improving memory bandwidth and stagnant memory latency. This makes the new algorithm especially important for future computer architectures. In addition, we believe this “memory wall ” will soon force a redesign of other common algorithms of symbolic algebra.
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
While the state of the art is relatively sophisticated in programming language support for computer ...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
This paper explores the interplay between algorithm design and a computer's memory hierarchy. M...
We present a model that enables us to analyze the running time of an algorithm on a computer with a ...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
Abstract: Memoryless computation is a modern technique to compute any function of a set of registers...
In order to keep up with the demand for solutions to problems with ever-increasing data sets, both a...
We propose two new instructions, swperm and sieve, that can be used to efficiently complete an arbit...
Memoryless computation is a new technique to compute any function of a set of registers by updating ...
Symbolic computation has underpinned a number of key advances in Mathematics and Computer Science. A...
We desire to permute N items w 0 ... , w N - 1 , in an ultracomputer containing P processing element...
How should one design and implement a program for the multiplication of sparse polynomials? This is ...
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
While the state of the art is relatively sophisticated in programming language support for computer ...
Abstract-- In this work, the performance of basic and strassen’s matrix multiplication algorithms ar...
This paper explores the interplay between algorithm design and a computer's memory hierarchy. M...
We present a model that enables us to analyze the running time of an algorithm on a computer with a ...
This Master Thesis examines if a matrix multiplication program that combines the two efficiency stra...
Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make ef...
Abstract: Memoryless computation is a modern technique to compute any function of a set of registers...
In order to keep up with the demand for solutions to problems with ever-increasing data sets, both a...
We propose two new instructions, swperm and sieve, that can be used to efficiently complete an arbit...
Memoryless computation is a new technique to compute any function of a set of registers by updating ...
Symbolic computation has underpinned a number of key advances in Mathematics and Computer Science. A...
We desire to permute N items w 0 ... , w N - 1 , in an ultracomputer containing P processing element...
How should one design and implement a program for the multiplication of sparse polynomials? This is ...
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
This report has been developed over the work done in the deliverable [Nava94] There it was shown tha...
While the state of the art is relatively sophisticated in programming language support for computer ...