This research describes Fountain, a suite of software used to monitor the resources of a cluster. A cluster is a collection of individual computers that are connected with a high speed communication network. They are traditionally used by users who desire more resources, such as processing power and memory, than any single computer can provide. A common drawback to effectively utilizing such a large scale system is the management infrastructure, which often does not often scale well as the system grows. Large-scale parallel systems provide new research challenges in the area of systems soft-ware, the programs or tools that manage the system from boot-up to running a parallel job. The approach presented in this thesis utilizes a collection s...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The size of applications is becoming greater every day. Software components enable developers for ea...
This paper discusses ongoing research at Oak Ridge National Laboratory (ORNL) to make computing clus...
This research describes Fountain, a suite of software used to monitor the resources of a cluster. A ...
This research describes Fountain, a suite of programs used to monitor the resources of a cluster. A ...
Monitoring systems give network administrators a better view and understanding of their networks. Am...
The demand for an efficient fault tolerance system has led to the development of complex monitoring ...
We present a monitoring system for large-scale parallel and distributed computing environments that ...
textScalable system monitoring is a fundamental abstraction for large-scale networked systems. The g...
Current monitoring solutions are not well suited to monitoring large data centers in different ways:...
Large scale computer clusters have during the last years become dominant for making computations in ...
Scalable management of distributed resources is one of the major challenges in deployment of large-s...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
In this paper, we present a structure for monitoring a large set of computational clusters. We illus...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The size of applications is becoming greater every day. Software components enable developers for ea...
This paper discusses ongoing research at Oak Ridge National Laboratory (ORNL) to make computing clus...
This research describes Fountain, a suite of software used to monitor the resources of a cluster. A ...
This research describes Fountain, a suite of programs used to monitor the resources of a cluster. A ...
Monitoring systems give network administrators a better view and understanding of their networks. Am...
The demand for an efficient fault tolerance system has led to the development of complex monitoring ...
We present a monitoring system for large-scale parallel and distributed computing environments that ...
textScalable system monitoring is a fundamental abstraction for large-scale networked systems. The g...
Current monitoring solutions are not well suited to monitoring large data centers in different ways:...
Large scale computer clusters have during the last years become dominant for making computations in ...
Scalable management of distributed resources is one of the major challenges in deployment of large-s...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
In this paper, we present a structure for monitoring a large set of computational clusters. We illus...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The size of applications is becoming greater every day. Software components enable developers for ea...
This paper discusses ongoing research at Oak Ridge National Laboratory (ORNL) to make computing clus...