We give asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bit-matrix-multiply/complement (BMMC) permutations on parallel disk systems. In a BMMC permutation on N records, where N is a power of 2, each (lg N)-bit source address x maps to a corresponding (lg N)-bit target address y by the matrix equation y = Ax XOR c, where matrix multiplication is performed over GF(2). The characteristic matrix A is (lg N) x (lg N) and nonsingular over GF(2). Under the Vitter-Shriver parallel-disk model with N records, D disks, B records per block, and M records of memory, we show a universal lower bound of $\Omega \left( \frac{N}{BD} \left( 1 + \frac{\rank{\gamma}}{\lg (M/B)} \right) \right)$ parallel ...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
International audienceWe tackle the feasibility and efficiency of two new parallel algorithms that s...
We provide time lowerbounds for sequential and parallel algorithms deciding bisimulation on labelled...
This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O opera...
We give asymptotically equal lower and upper bounds for the number of parallel I/O operations requir...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
Increasingly, modern computing problems, including many scientific and business applications, requir...
This paper presents an architecture-independent method for performing BMMC permutations on multiproc...
In a generalized shuffle permutation an address (a[q-1]a[1-2]...a[0]) receives its content from an a...
In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (B...
The authors implemented and measured several methods to perform BMMC permutations on the MasPar MP-2...
International audienceCommunication lower bounds have long been established for matrix multiplicatio...
We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optim...
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many commo...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
International audienceWe tackle the feasibility and efficiency of two new parallel algorithms that s...
We provide time lowerbounds for sequential and parallel algorithms deciding bisimulation on labelled...
This paper presents asymptotically equal lower and upper bounds for the number of parallel I/O opera...
We give asymptotically equal lower and upper bounds for the number of parallel I/O operations requir...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
The ability to perform permutations of large data sets in place reduces the amount of necessary avai...
Increasingly, modern computing problems, including many scientific and business applications, requir...
This paper presents an architecture-independent method for performing BMMC permutations on multiproc...
In a generalized shuffle permutation an address (a[q-1]a[1-2]...a[0]) receives its content from an a...
In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (B...
The authors implemented and measured several methods to perform BMMC permutations on the MasPar MP-2...
International audienceCommunication lower bounds have long been established for matrix multiplicatio...
We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optim...
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many commo...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
International audienceWe tackle the feasibility and efficiency of two new parallel algorithms that s...
We provide time lowerbounds for sequential and parallel algorithms deciding bisimulation on labelled...