We present a calculus to formalize and give costs to parallel computations over multidimensional dense arrays. The calculus extends a simple distribution calculus (proposed in some previous work) with computation and data collection. We consider an SPMD programming model in which process interaction can take place using point-to-point as well as collective operations, much in the style of MPI. We want to give a rigorous description of all stages of data parallel applications working over dense arrays: initial distribution (i.e., partition and replication) of arrays over a set of processors, parallel computation over distributed data, exchange of intermediate results and final data gather. In the paper, beside defining the calculus, ...
We propose a theoretical framework for the per-formance analysis and optimization of parallel pro-gr...
Modern large-scale deep learning workloads highlight the need for parallel execution across many dev...
We consider distribution at compile time of the array data in a distributed-memory implementation of...
Two approaches to architecture-independent parallel computation are investigated: a constructive fun...
AbstractWe propose a set-theoretic model for parallelism. The model is based on separate distributio...
Multipartitioning is a strategy for decomposing multi-dimensional arrays into tiles and mapping the ...
This article focuses on principles for the design of efficient parallel algorithms for distributed m...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
This paper considers the expression and derivation of efficient data parallel programs for SIMD and ...
We present algorithms for the transportation of data in parallel and distributed systems that would ...
(eng) Multipartitioning is a strategy for partitioning multi-dimensional arrays on a collection of p...
Abstract: "Berry and Curien, building on Kahn and Plotkin's theory of Concrete Data Structures and s...
AbstractBuilding on Kahn and Plotkin's theory of concrete data structures and sequential functions, ...
Data distribution functions are introduced. They are matced with scheduling functions. The processor...
Abstract: "Building on Kahn and Plotkin's theory of concrete data structures and sequential function...
We propose a theoretical framework for the per-formance analysis and optimization of parallel pro-gr...
Modern large-scale deep learning workloads highlight the need for parallel execution across many dev...
We consider distribution at compile time of the array data in a distributed-memory implementation of...
Two approaches to architecture-independent parallel computation are investigated: a constructive fun...
AbstractWe propose a set-theoretic model for parallelism. The model is based on separate distributio...
Multipartitioning is a strategy for decomposing multi-dimensional arrays into tiles and mapping the ...
This article focuses on principles for the design of efficient parallel algorithms for distributed m...
On shared memory parallel computers (SMPCs) it is natural to focus on decomposing the computation (...
This paper considers the expression and derivation of efficient data parallel programs for SIMD and ...
We present algorithms for the transportation of data in parallel and distributed systems that would ...
(eng) Multipartitioning is a strategy for partitioning multi-dimensional arrays on a collection of p...
Abstract: "Berry and Curien, building on Kahn and Plotkin's theory of Concrete Data Structures and s...
AbstractBuilding on Kahn and Plotkin's theory of concrete data structures and sequential functions, ...
Data distribution functions are introduced. They are matced with scheduling functions. The processor...
Abstract: "Building on Kahn and Plotkin's theory of concrete data structures and sequential function...
We propose a theoretical framework for the per-formance analysis and optimization of parallel pro-gr...
Modern large-scale deep learning workloads highlight the need for parallel execution across many dev...
We consider distribution at compile time of the array data in a distributed-memory implementation of...