We develop lower bounds on communication in the memory hierarchy or between processors for nested bilinear algorithms, such as Strassen's algorithm for matrix multiplication. We build on a previous framework that establishes communication lower bounds by use of the rank expansion, or the minimum rank of any fixed size subset of columns of a matrix, for each of the three matrices encoding a bilinear algorithm. This framework provides lower bounds for any way of computing a bilinear algorithm, which encompasses a larger space of algorithms than a fixed dependency graph. Two bilinear algorithms can be nested by taking Kronecker products between their encoding matrices. Our main result is a lower bound on the rank expansion of a matrix construc...
A tight Ω((n/M ̅ ̅√)log27M) lower bound is derived on the I/O complexity of Strassen’s algorithm to ...
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce compu...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
AbstractWe present several bilinear algorithms for the acceleration of multiplication of n X n matri...
AbstractWe introduce a new and easily applicable criterion called rank immunity for estimating the m...
The movement of data (communication) between levels of a memory hierarchy, or between parallel proce...
We investigate two methods for proving lower bounds on the size of small depth circuits, namely the ...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
AbstractAlthough general theories are beginning to emerge in the area of automata based complexity t...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Thesis (Ph.D.)--University of Washington, 2020In this thesis, we study basic lower bound questions i...
International audienceIn this paper, we focus on the parallel communication cost of multiplying a ma...
textMultiparty communication complexity is a measure of the amount of communication required to com...
In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (B...
A tight Ω((n/M ̅ ̅√)log27M) lower bound is derived on the I/O complexity of Strassen’s algorithm to ...
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce compu...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...
AbstractWe present several bilinear algorithms for the acceleration of multiplication of n X n matri...
AbstractWe introduce a new and easily applicable criterion called rank immunity for estimating the m...
The movement of data (communication) between levels of a memory hierarchy, or between parallel proce...
We investigate two methods for proving lower bounds on the size of small depth circuits, namely the ...
Dense linear algebra computations are essential to nearly every problem in scientific computing and ...
We present lower bounds on the amount of communication that matrix multiplication algorithms must pe...
AbstractAlthough general theories are beginning to emerge in the area of automata based complexity t...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Thesis (Ph.D.)--University of Washington, 2020In this thesis, we study basic lower bound questions i...
International audienceIn this paper, we focus on the parallel communication cost of multiplying a ma...
textMultiparty communication complexity is a measure of the amount of communication required to com...
In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (B...
A tight Ω((n/M ̅ ̅√)log27M) lower bound is derived on the I/O complexity of Strassen’s algorithm to ...
In this paper we study the tradeoff between parallelism and communication cost in a map-reduce compu...
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P...