This research describes Fountain, a suite of programs used to monitor the resources of a cluster. A cluster is a collection of individual computers that are connected via a high speed communication network. They are traditionally used by users who desire more resources, such as processing power and memory, than any single computer can provide. A common drawback to effectively utilizing such a large-scale system is the management infrastructure, which often does not often scale well as the system grows. Large-scale parallel systems provide new research challenges in the area of systems software, the programs or tools that manage the system from boot-up to running a parallel job. The approach presented in this thesis utilizes a collection of ...
The size of applications is becoming greater every day. Software components enable developers for ea...
This paper discusses ongoing research at Oak Ridge National Laboratory (ORNL) to make computing clus...
The constant monitoring of a computer is one of the essentials to be up-to-date about its state. Thi...
This research describes Fountain, a suite of software used to monitor the resources of a cluster. A ...
Monitoring systems give network administrators a better view and understanding of their networks. Am...
The demand for an efficient fault tolerance system has led to the development of complex monitoring ...
We present a monitoring system for large-scale parallel and distributed computing environments that ...
textScalable system monitoring is a fundamental abstraction for large-scale networked systems. The g...
Large scale computer clusters have during the last years become dominant for making computations in ...
Scalable management of distributed resources is one of the major challenges in deployment of large-s...
Current monitoring solutions are not well suited to monitoring large data centers in different ways:...
In this paper, we present a structure for monitoring a large set of computational clusters. We illus...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The size of applications is becoming greater every day. Software components enable developers for ea...
This paper discusses ongoing research at Oak Ridge National Laboratory (ORNL) to make computing clus...
The constant monitoring of a computer is one of the essentials to be up-to-date about its state. Thi...
This research describes Fountain, a suite of software used to monitor the resources of a cluster. A ...
Monitoring systems give network administrators a better view and understanding of their networks. Am...
The demand for an efficient fault tolerance system has led to the development of complex monitoring ...
We present a monitoring system for large-scale parallel and distributed computing environments that ...
textScalable system monitoring is a fundamental abstraction for large-scale networked systems. The g...
Large scale computer clusters have during the last years become dominant for making computations in ...
Scalable management of distributed resources is one of the major challenges in deployment of large-s...
Current monitoring solutions are not well suited to monitoring large data centers in different ways:...
In this paper, we present a structure for monitoring a large set of computational clusters. We illus...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The size of applications is becoming greater every day. Software components enable developers for ea...
This paper discusses ongoing research at Oak Ridge National Laboratory (ORNL) to make computing clus...
The constant monitoring of a computer is one of the essentials to be up-to-date about its state. Thi...