Coupling regular topologies with optimized routing algorithms is key in pushing the performance of interconnection networks of HPC systems. In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalized Fat-Trees (PGFTs) which minimizes congestion risk even under massive topology degradation caused by equipment failure. It applies a modulo-based computation of forwarding tables among switches closer to the destination, using only knowledge of subtrees for pre-modulo division. Dmodc allows complete rerouting of topologies with tens of thousands of nodes in less than a second, which greatly helps centralized fabric management react to faults with high-quality routing tables and no impact to running application...
International audienceReliable and highly available computer networks must implement resilient fast ...
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper pr...
With the rapid shrinking of technology and growing integration capacity, the probability of failures...
International audienceCoupling regular topologies with optimized routing algorithms is key in pushin...
International audienceCoupling regular topologies with optimised routing algorithms is key in pushin...
High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I...
Clusters of PCs have become very popular to build high performance computers. These machines use com...
International audienceHigh-Performance Computing (HPC) clusters are made up of a variety of node typ...
The final publication is available at Springer via http://link.springer.com/article/10.1007%2Fs11227...
The final publication is available at Springer via http://link.springer.com/article/10.1007%2Fs11227...
International audienceNetwork failures are frequent and disruptive, and can significantly reduce the...
International audienceNetwork failures are frequent and disruptive, and can significantly reduce the...
Massively parallel computing systems are being built with hundreds or thousands of components such a...
International audienceReliable and highly available computer networks must implement resilient fast ...
HPC network topology design is currently shifting from high-performance, higher-cost Fat-Trees to mo...
International audienceReliable and highly available computer networks must implement resilient fast ...
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper pr...
With the rapid shrinking of technology and growing integration capacity, the probability of failures...
International audienceCoupling regular topologies with optimized routing algorithms is key in pushin...
International audienceCoupling regular topologies with optimised routing algorithms is key in pushin...
High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I...
Clusters of PCs have become very popular to build high performance computers. These machines use com...
International audienceHigh-Performance Computing (HPC) clusters are made up of a variety of node typ...
The final publication is available at Springer via http://link.springer.com/article/10.1007%2Fs11227...
The final publication is available at Springer via http://link.springer.com/article/10.1007%2Fs11227...
International audienceNetwork failures are frequent and disruptive, and can significantly reduce the...
International audienceNetwork failures are frequent and disruptive, and can significantly reduce the...
Massively parallel computing systems are being built with hundreds or thousands of components such a...
International audienceReliable and highly available computer networks must implement resilient fast ...
HPC network topology design is currently shifting from high-performance, higher-cost Fat-Trees to mo...
International audienceReliable and highly available computer networks must implement resilient fast ...
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper pr...
With the rapid shrinking of technology and growing integration capacity, the probability of failures...