The growing demand for more computational power to solve complex scientific problems is driving the physical scale of the system to hundreds and thousands of nodes. Ideally, scaling up the number of nodes should minimize the completion time, but algorithmic and system environment factors limit the scalability of parallel applications. Reliability is one of the major factors that limit performance, especially when applications are scaled to thousands of nodes. In this paper, we propose a reliability-aware optimal k-node allocation algorithm and compare it with Round Robin and reliability-aware resource allocation algorithms. Our simulation results indicate that giving flexibility to the resource manager in choosing the optimal number of node...
Rschen~cc. nctu. edu. tw (Received and accepted August 1993) Abst rac t-- In this paper, we propose ...
Simultaneous consideration of both performance and reliability issues is important in the choice of ...
This paper reports: 1) parallelization of the two best known sequential algorithms (Dotson & Gobein,...
Present and future Computational applications require massively parallel processors- Top500.org re...
The demand for more computational power to solve complex scientific problems has been driving the ph...
In high performance computing systems, parallel applications request a large number of resources for...
In this work, we present a heuristic method to reduce the computational time and the absolute error ...
This paper investigates the problem of allocating parallel application tasks to processors in hetero...
AbstractDistributed Computing Systems (DCS) have become a major trend in computer system design toda...
AbstractIn this paper, we propose a simple, easily programmed exact method for obtaining the optimal...
This paper presents a mathematical model for a redundancy allocation problem (RAP) with k-out-of-n s...
The rapid progress of microprocessor and communication technologies has made the distributed computi...
The problem of designing multi-stage systems with a high degree of reliability has attracted the att...
International audienceApplications implemented on critical systems are subject to both safety critic...
This paper presents a mathematical model for a redundancy allocation problem (RAP) for the series-pa...
Rschen~cc. nctu. edu. tw (Received and accepted August 1993) Abst rac t-- In this paper, we propose ...
Simultaneous consideration of both performance and reliability issues is important in the choice of ...
This paper reports: 1) parallelization of the two best known sequential algorithms (Dotson & Gobein,...
Present and future Computational applications require massively parallel processors- Top500.org re...
The demand for more computational power to solve complex scientific problems has been driving the ph...
In high performance computing systems, parallel applications request a large number of resources for...
In this work, we present a heuristic method to reduce the computational time and the absolute error ...
This paper investigates the problem of allocating parallel application tasks to processors in hetero...
AbstractDistributed Computing Systems (DCS) have become a major trend in computer system design toda...
AbstractIn this paper, we propose a simple, easily programmed exact method for obtaining the optimal...
This paper presents a mathematical model for a redundancy allocation problem (RAP) with k-out-of-n s...
The rapid progress of microprocessor and communication technologies has made the distributed computi...
The problem of designing multi-stage systems with a high degree of reliability has attracted the att...
International audienceApplications implemented on critical systems are subject to both safety critic...
This paper presents a mathematical model for a redundancy allocation problem (RAP) for the series-pa...
Rschen~cc. nctu. edu. tw (Received and accepted August 1993) Abst rac t-- In this paper, we propose ...
Simultaneous consideration of both performance and reliability issues is important in the choice of ...
This paper reports: 1) parallelization of the two best known sequential algorithms (Dotson & Gobein,...