AbstractThe construction of distributed algorithms for matrix computations built on top of distributed data aggregation algorithms with randomized communication schedules is investigated. For this purpose, a new aggregation algorithm for summing or averaging distributed values, the push-flow algorithm, is developed, which achieves superior resilience properties with respect to failures compared to existing aggregation methods. It is illustrated that on a hypercube topology it asymptotically requires the same number of iterations as the optimal all-to-all reduction operation and that it scales well with the number of nodes. Orthogonalization is studied as a prototypical matrix computation task. A new fault tolerant distributed orthogonalizat...
Distributed clustering algorithms have proven to be effective in dramatically reducing execution tim...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
This paper introduces a novel distributed algorithm over static directed graphs for solving big data...
AbstractThe construction of distributed algorithms for matrix computations built on top of distribut...
The construction of distributed algorithms for matrix com-putations built on top of distributed data...
AbstractIn this paper, we investigate and compare the fault tolerance properties and resilience of g...
Distributed matrix computations (matrix-vector and matrix-matrix multiplications) are at the heart o...
As an increasing number of modern big data systems utilize horizontal scaling,the general trend in t...
AbstractThe Do-All problem is about scheduling t similar and independent tasks to be performed by p ...
Big data projects increasingly make use of networks of heterogeneous computational resources for sci...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
A ubiquitous problem in computer science research is the optimization of computation on large data s...
Aggregation is an important building block of modern distributed applications, allowing the determin...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Distributed clustering algorithms have proven to be effective in dramatically reducing execution tim...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
This paper introduces a novel distributed algorithm over static directed graphs for solving big data...
AbstractThe construction of distributed algorithms for matrix computations built on top of distribut...
The construction of distributed algorithms for matrix com-putations built on top of distributed data...
AbstractIn this paper, we investigate and compare the fault tolerance properties and resilience of g...
Distributed matrix computations (matrix-vector and matrix-matrix multiplications) are at the heart o...
As an increasing number of modern big data systems utilize horizontal scaling,the general trend in t...
AbstractThe Do-All problem is about scheduling t similar and independent tasks to be performed by p ...
Big data projects increasingly make use of networks of heterogeneous computational resources for sci...
We present a new approach to fault tolerance for High Performance Computing system. Our approach is ...
With the proliferation of parallel and distributed systems, it is an increasingly important problem ...
A ubiquitous problem in computer science research is the optimization of computation on large data s...
Aggregation is an important building block of modern distributed applications, allowing the determin...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Distributed clustering algorithms have proven to be effective in dramatically reducing execution tim...
The lack of efficient resilience solutions is expected to be a major problem for the coming exascale...
This paper introduces a novel distributed algorithm over static directed graphs for solving big data...