Increasingly, modern computing problems, including many scientific and business applications, require huge amounts of data to be examined, modified, and stored. Parallel computers can be used to decrease the time needed to operate on such large data sets, by allowing computations to be performed on many pieces of data at once. For example, on the DECmpp machine used in our research, there are 2048 processors in the parallel processor array. The DECmpp can read data into each of these processors, perform a computation in parallel on all of it, and write the data out again, theoretically decreasing the execution time by a factor of 2048 over the time required by one of its processors. Often, the computations that occur after the data is in th...
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place ...
We desire to permute N items w 0 ... , w N - 1 , in an ultracomputer containing P processing element...
Communication is a major factor determining the performance of algorithms on current computing syste...
Increasingly, modern computing problems, including many scientific and business applications, requir...
This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O opera...
This paper presents an architecture-independent method for performing BMMC permutations on multiproc...
We investigate the problem of permuting n data items on an EREW PRAM with p processors using little ...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many commo...
Solving large permutation Combinatorial Optimization Problems (COPs) using Branch-and-Bound (B&B) al...
We give asymptotically equal lower and upper bounds for the number of parallel I/O operations requir...
The authors implemented and measured several methods to perform BMMC permutations on the MasPar MP-2...
This chapter describes the Decomposable Bulk Synchrounous Parallel (D-BSP) model of computation, as ...
International audienceWe tackle the feasibility and efficiency of two new parallel algorithms that s...
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place ...
We desire to permute N items w 0 ... , w N - 1 , in an ultracomputer containing P processing element...
Communication is a major factor determining the performance of algorithms on current computing syste...
Increasingly, modern computing problems, including many scientific and business applications, requir...
This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O opera...
This paper presents an architecture-independent method for performing BMMC permutations on multiproc...
We investigate the problem of permuting n data items on an EREW PRAM with p processors using little ...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many commo...
Solving large permutation Combinatorial Optimization Problems (COPs) using Branch-and-Bound (B&B) al...
We give asymptotically equal lower and upper bounds for the number of parallel I/O operations requir...
The authors implemented and measured several methods to perform BMMC permutations on the MasPar MP-2...
This chapter describes the Decomposable Bulk Synchrounous Parallel (D-BSP) model of computation, as ...
International audienceWe tackle the feasibility and efficiency of two new parallel algorithms that s...
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place ...
We desire to permute N items w 0 ... , w N - 1 , in an ultracomputer containing P processing element...
Communication is a major factor determining the performance of algorithms on current computing syste...