Traditional cluster monitoring approaches consider nodes in singleton, using manufacturer-specified extreme limits as thresholds for failure ''prediction''. We have developed a tool, OVIS, for monitoring and analysis of large computational platforms which, instead, uses a statistical approach to characterize single device behaviors from those of a large number of statistically similar devices. Baseline capabilities of OVIS include the visual display of deterministic information about state variables (e.g., temperature, CPU utilization, fan speed) and their aggregate statistics. Visual consideration of the cluster as a comparative ensemble, rather than as singleton nodes, is an easy and useful method for tuning cluster configuration and dete...
International audienceFor the monitoring of large-scale clustered network systems (CNS), it suffices...
Modern data centers that provide Internet-scale services are stadium-size structures housing tens of...
The CMS experiment's online cluster consists of 2300 computers and 170 switches or routers operating...
Traditional cluster monitoring approaches consider nodes in singleton, using manufacturer-specified ...
This document describes how to obtain, install, use, and enjoy a better life with OVIS version 2.0. ...
This document describes how to obtain, install, use, and enjoy a better life with OVIS version 3.2. ...
This report summarizes the current statistical analysis capability of OVIS and how it works in conju...
Effective monitoring of large computational clusters demands the analysis of a vast amount of raw da...
Cluster became main platform as parallel and distributed computing structure for high performance co...
HPC-ODA is a collection of datasets acquired on production HPC systems, which are representative of ...
Large scale computer clusters have during the last years become dominant for making computations in ...
Large-scale clusters are growing at a rapid pace, and the resulting amount of monitoring data produc...
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, pri...
Current large scale computational research infrastructures are composed of multitudes of compute no...
Clustering, the task of grouping together similar items, is a frequently used method for processing ...
International audienceFor the monitoring of large-scale clustered network systems (CNS), it suffices...
Modern data centers that provide Internet-scale services are stadium-size structures housing tens of...
The CMS experiment's online cluster consists of 2300 computers and 170 switches or routers operating...
Traditional cluster monitoring approaches consider nodes in singleton, using manufacturer-specified ...
This document describes how to obtain, install, use, and enjoy a better life with OVIS version 2.0. ...
This document describes how to obtain, install, use, and enjoy a better life with OVIS version 3.2. ...
This report summarizes the current statistical analysis capability of OVIS and how it works in conju...
Effective monitoring of large computational clusters demands the analysis of a vast amount of raw da...
Cluster became main platform as parallel and distributed computing structure for high performance co...
HPC-ODA is a collection of datasets acquired on production HPC systems, which are representative of ...
Large scale computer clusters have during the last years become dominant for making computations in ...
Large-scale clusters are growing at a rapid pace, and the resulting amount of monitoring data produc...
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, pri...
Current large scale computational research infrastructures are composed of multitudes of compute no...
Clustering, the task of grouping together similar items, is a frequently used method for processing ...
International audienceFor the monitoring of large-scale clustered network systems (CNS), it suffices...
Modern data centers that provide Internet-scale services are stadium-size structures housing tens of...
The CMS experiment's online cluster consists of 2300 computers and 170 switches or routers operating...