Optimizing Collective Communication for Scalable Scientific Computing and Deep Learning

Li, Jiali

Open PDF

Open link

Publication date

August 2023

Publisher

TRACE: Tennessee Research and Creative Exchange

Language

English

Abstract

In the realm of distributed computing, collective operations involve coordinated communication and synchronization among multiple processing units, enabling efficient data exchange and collaboration. Scientific applications, such as simulations, computational fluid dynamics, and scalable deep learning, require complex computations that can be parallelized across multiple nodes in a distributed system. These applications often involve data-dependent communication patterns, where collective operations are critical for achieving high performance in data exchange. Optimizing collective operations for scientific applications and deep learning involves improving the algorithms, communication patterns, and data distribution strategies to minimize ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Optimizing Collective Communication for Scalable Scientific Computing and Deep Learning

Abstract

Extracted data

Optimizing Collective Communication for Scalable Scientific Computing and Deep Learning

Abstract

Extracted data

Related items

Related items