| openaire: EC/H2020/818665/EU//UniSDyn Funding Information: This work was supported by the Academy of Finland ReSoLVE Centre of Excellence (grant number 307411 ); the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Project UniSDyn, grant agreement n:o 818665 ); and CHARMS within ASIAA from Academia Sinica. Publisher Copyright: © 2022 The AuthorsModern compute nodes in high-performance computing provide a tremendous level of parallelism and processing power. However, as arithmetic performance has been observed to increase at a faster rate relative to memory and network bandwidths, optimizing data movement has become critical for achieving strong scaling in many communication-heavy a...