The constant monitoring of a computer is one of the essentials to be up-to-date about its state. This may seem trivial if one is sitting right in front of it but when monitoring a computer from a certain distance it is not as simple anymore. It gets even more difficult if a large number of computers need to be monitored. Because the process of monitoring always causes some load on the network and the monitored computer itself, it is important to keep these influences as low as possible. Especially for a high-performance cluster that was built from a lot of computers, it is necessary that the monitoring approach works as efficiently as possible and does not influence the actual operations of the supercomputer. Thus, the main goals of this w...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The demand for an efficient fault tolerance system has led to the development of complex monitoring ...
Computationally intensive applications often require the employment of parallel/distributed solution...
The constant monitoring of a computer is one of the essentials to be up-to-date about its state. Thi...
Large scale computer clusters have during the last years become dominant for making computations in ...
The CMS experiment's online cluster consists of 2300 computers and 170 switches or routers operating...
Effective management and utilization of large com-puter clusters requires suitable management tools....
Monitoring systems are necessary for the management of anything beyond the smallest networks of comp...
In this paper, we present a structure for monitoring a large set of computational clusters. We illus...
This research describes Fountain, a suite of programs used to monitor the resources of a cluster. A ...
The subject of this work relates to monitoring of large networks consisting of hundreds of active el...
This research describes Fountain, a suite of software used to monitor the resources of a cluster. A ...
Monitoring is the act of collecting information concerning the characteristics and status of resourc...
When working with distributed systems, detecting faults can be a difficult task, as abnormalities is...
Monitoring systems give network administrators a better view and understanding of their networks. Am...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The demand for an efficient fault tolerance system has led to the development of complex monitoring ...
Computationally intensive applications often require the employment of parallel/distributed solution...
The constant monitoring of a computer is one of the essentials to be up-to-date about its state. Thi...
Large scale computer clusters have during the last years become dominant for making computations in ...
The CMS experiment's online cluster consists of 2300 computers and 170 switches or routers operating...
Effective management and utilization of large com-puter clusters requires suitable management tools....
Monitoring systems are necessary for the management of anything beyond the smallest networks of comp...
In this paper, we present a structure for monitoring a large set of computational clusters. We illus...
This research describes Fountain, a suite of programs used to monitor the resources of a cluster. A ...
The subject of this work relates to monitoring of large networks consisting of hundreds of active el...
This research describes Fountain, a suite of software used to monitor the resources of a cluster. A ...
Monitoring is the act of collecting information concerning the characteristics and status of resourc...
When working with distributed systems, detecting faults can be a difficult task, as abnormalities is...
Monitoring systems give network administrators a better view and understanding of their networks. Am...
In this paper we describe the architecture of PerfMC, a performance monitoring system for clusters o...
The demand for an efficient fault tolerance system has led to the development of complex monitoring ...
Computationally intensive applications often require the employment of parallel/distributed solution...