This thesis examines the problems of designing massively fault-tolerant routing schemes and analyzing their performance. We concentrate primarily on distributed memory systems consisting of a large number of processors. The size of these parallel systems induces us to focus on distributed routing algorithms which require only local fault information, such as the fault status of immediate neighbors. Our notion of faults is quite general. We consider a processor (or communication link) to be faulty if it is unavailable for use in message communications. This definition includes hardware failures as well as congested processors (i.e., communications hot spots). Hence, it is quite natural for us to consider systems in which massive numbers of c...
Abstract—Massively parallel computing systems are being built with thousands of nodes. The interconn...
A central problem in massively parallel computing is ef-ficiently routing data between processors. T...
. We describe fault-tolerant routing of multicast messages in mesh-based wormhole-switched multicomp...
The design and analysis of fault tolerant message routing schemes for large parallel systems has bee...
AbstractThe design and analysis of fault tolerant message routing schemes for large parallel systems...
AbstractThe design and analysis of fault tolerant message routing schemes for large parallel systems...
Fast and efficient communications are essential to the success of large-scale multiprocessor paralle...
Fast and efficient communications are essential to the success of large-scale multiprocessor paralle...
Massively parallel computing systems are being built with hundreds or thousands of components such a...
. Efficient communication in networks is a prerequisite to exploit the performance of large parallel...
This thesis presents a fault-tolerant message passing system incorporating a variation of the distri...
This thesis presents a fault-tolerant message passing system incorporating a variation of the distri...
In real-time computing applications, it is important to have parallel computing systems that not onl...
In real-time computing applications, it is important to have parallel computing systems that not onl...
Efficient data motion has been critical in high performance computing for as long as computers have ...
Abstract—Massively parallel computing systems are being built with thousands of nodes. The interconn...
A central problem in massively parallel computing is ef-ficiently routing data between processors. T...
. We describe fault-tolerant routing of multicast messages in mesh-based wormhole-switched multicomp...
The design and analysis of fault tolerant message routing schemes for large parallel systems has bee...
AbstractThe design and analysis of fault tolerant message routing schemes for large parallel systems...
AbstractThe design and analysis of fault tolerant message routing schemes for large parallel systems...
Fast and efficient communications are essential to the success of large-scale multiprocessor paralle...
Fast and efficient communications are essential to the success of large-scale multiprocessor paralle...
Massively parallel computing systems are being built with hundreds or thousands of components such a...
. Efficient communication in networks is a prerequisite to exploit the performance of large parallel...
This thesis presents a fault-tolerant message passing system incorporating a variation of the distri...
This thesis presents a fault-tolerant message passing system incorporating a variation of the distri...
In real-time computing applications, it is important to have parallel computing systems that not onl...
In real-time computing applications, it is important to have parallel computing systems that not onl...
Efficient data motion has been critical in high performance computing for as long as computers have ...
Abstract—Massively parallel computing systems are being built with thousands of nodes. The interconn...
A central problem in massively parallel computing is ef-ficiently routing data between processors. T...
. We describe fault-tolerant routing of multicast messages in mesh-based wormhole-switched multicomp...