With the explosion of the number of distributed applications, a new dynamic server environment emerged grouping servers into clusters, whose utilization depends on the current demand for the application. To provide reliable and smooth services it is crucial to detect and fix possible erratic behavior of individual servers in these clusters. Use of standard techniques for this purpose delivers suboptimal results. We have developed a method based on machine learning techniques which allows detecting outliers indicating a possible problematic situation. The method inspects the performance of the rest of the cluster and provides system operators with additional information which allows them to identify quickly the failing nodes. We applied this...
Reliability is one of the key performance factors in data centres. The out-of-scale energy costs of ...
The IHEP local cluster is a middle-sized HEP data center which consists of 20'000 CPU slots, hundred...
Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
Anomaly detection in the CERN OpenStack cloud is a challenging task due to the large scale of the co...
This thesis investigates the possibility of using anomaly detection on performance data of virtual s...
Monitoring has proved to be a crucial part of the operation lifecycle of any computer system, as it ...
The CERN automation infrastructure consists of over 600 heterogeneous industrial control systems wit...
Monitoring the health of large data centers is a major concern with the ever-increasing demand of gr...
Reliability, availability and maintainability determine whether or not a large-scale accelerator sys...
A Grid computing site consists of various services including Grid middlewares, such as Computing Ele...
Reliability, availability and maintainability are parameters that determine if a large-scale acceler...
Large microservice clusters deployed in the cloud can be very di\u81fficult to both monitor and debu...
Reliability is one of the key performance factors in data centres. The out-of-scale energy costs of ...
The IHEP local cluster is a middle-sized HEP data center which consists of 20'000 CPU slots, hundred...
Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
With the explosion of the number of distributed applications, a new dynamic server environment emerg...
Anomaly detection in the CERN OpenStack cloud is a challenging task due to the large scale of the co...
This thesis investigates the possibility of using anomaly detection on performance data of virtual s...
Monitoring has proved to be a crucial part of the operation lifecycle of any computer system, as it ...
The CERN automation infrastructure consists of over 600 heterogeneous industrial control systems wit...
Monitoring the health of large data centers is a major concern with the ever-increasing demand of gr...
Reliability, availability and maintainability determine whether or not a large-scale accelerator sys...
A Grid computing site consists of various services including Grid middlewares, such as Computing Ele...
Reliability, availability and maintainability are parameters that determine if a large-scale acceler...
Large microservice clusters deployed in the cloud can be very di\u81fficult to both monitor and debu...
Reliability is one of the key performance factors in data centres. The out-of-scale energy costs of ...
The IHEP local cluster is a middle-sized HEP data center which consists of 20'000 CPU slots, hundred...
Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific...