Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD computations, the most prevalent form of parallelism in deep learning. We present a type-directed approach to synthesizing array redistributions as sequences of MPI-style collective operations. We prove formally that our synthesized redistributions are memory-efficient and perform no excessive data transfers. Array redistribution for SPMD computations ...
International audienceA programming model that is widely approved today for large applications is pa...
We review a decade\u27s work on message passing MIMD parallel computers in the areas of hardware, so...
Abstract. In many scientic applications, array redistribution is usually required to enhance data lo...
[[abstract]]In many scientific applications, array redistribution is usually required to enhance dat...
In HPC, data redistributions (reorganizations) are used in parallel applications to improve performa...
This thesis argues that a modular, source-to-source translation system for distributed-shared memory...
Languages such as High Performance Fortran implement parallel algorithms by distributing large data ...
[[abstract]]In this paper, we present efficient methods for multidimensional array redistribution. B...
Appropriate data distribution has been found to be critical for obtaining good performance on Distri...
Dynamic redistribution of arrays is required very often in programs on distributed memory parallel c...
Distributed-memory message-passing machines deliver scalable perfor-mance but are difficult to progr...
Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle...
The bandwidth mismatch between processor and main memory is one major limiting problem. Although str...
Two major techniques are commonly used to meet real-time inference limitations when distributing mod...
[[abstract]]Array redistribution is usually required, to enhance algorithm performance in many paral...
International audienceA programming model that is widely approved today for large applications is pa...
We review a decade\u27s work on message passing MIMD parallel computers in the areas of hardware, so...
Abstract. In many scientic applications, array redistribution is usually required to enhance data lo...
[[abstract]]In many scientific applications, array redistribution is usually required to enhance dat...
In HPC, data redistributions (reorganizations) are used in parallel applications to improve performa...
This thesis argues that a modular, source-to-source translation system for distributed-shared memory...
Languages such as High Performance Fortran implement parallel algorithms by distributing large data ...
[[abstract]]In this paper, we present efficient methods for multidimensional array redistribution. B...
Appropriate data distribution has been found to be critical for obtaining good performance on Distri...
Dynamic redistribution of arrays is required very often in programs on distributed memory parallel c...
Distributed-memory message-passing machines deliver scalable perfor-mance but are difficult to progr...
Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle...
The bandwidth mismatch between processor and main memory is one major limiting problem. Although str...
Two major techniques are commonly used to meet real-time inference limitations when distributing mod...
[[abstract]]Array redistribution is usually required, to enhance algorithm performance in many paral...
International audienceA programming model that is widely approved today for large applications is pa...
We review a decade\u27s work on message passing MIMD parallel computers in the areas of hardware, so...
Abstract. In many scientic applications, array redistribution is usually required to enhance data lo...