Diagnosing IT issues is a challenging problem for large-scale distributed cloud environments due to complex and non-deterministic interrelations between the system components. Modern monitoring tools rely on AI-empowered data analytics for detection, root cause analysis, and rapid resolution of performance degradation. However, the successful adoption of AI solutions is anchored on trust. System administrators will not unthinkingly follow the recommendations without sufficient interpretability of solutions. Explainable AI is gaining popularity by enabling improved confidence and trust in intelligent solutions. For many industrial applications, explainable models with moderate accuracy are preferable to highly precise black-box ones. This pa...
Cloud computing systems fail in complex and unexpected ways, due to unexpected combinations of event...
From traditional networking to cloud computing, one of the essential but formidable tasks is to dete...
Operation and maintenance of large distributed cloud applications can quickly become unmanageably co...
Automated root cause analysis of performance problems in modern cloud computing infrastructures is o...
When operating large cloud computing infrastructures, ensuring healthiness of physical resources and...
Effective root cause analysis (RCA) of performance issues in modern cloud environ- ments remains a h...
International audienceCloud-Applications are the new industry standard way of designing Web-Applicat...
International audienceThe Cloud computing paradigm has become the new industry standard way of desig...
Context: With an increasing number of applications running on a microservices-based cloud system (su...
Distributed tracing allows tracking user requests that span across multiple services and machines in...
Abstract—In this paper, we present CLUE, a system event analytics tool for black-box performance dia...
The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage...
Failures in computer systems can be often tracked down to software anomalies of various kinds. In ma...
Heterogeneous mobile, sensor, IoT, smart environment, and social networking applications have recent...
Cloud computing systems provide the facilities to make application services resilient against failur...
Cloud computing systems fail in complex and unexpected ways, due to unexpected combinations of event...
From traditional networking to cloud computing, one of the essential but formidable tasks is to dete...
Operation and maintenance of large distributed cloud applications can quickly become unmanageably co...
Automated root cause analysis of performance problems in modern cloud computing infrastructures is o...
When operating large cloud computing infrastructures, ensuring healthiness of physical resources and...
Effective root cause analysis (RCA) of performance issues in modern cloud environ- ments remains a h...
International audienceCloud-Applications are the new industry standard way of designing Web-Applicat...
International audienceThe Cloud computing paradigm has become the new industry standard way of desig...
Context: With an increasing number of applications running on a microservices-based cloud system (su...
Distributed tracing allows tracking user requests that span across multiple services and machines in...
Abstract—In this paper, we present CLUE, a system event analytics tool for black-box performance dia...
The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage...
Failures in computer systems can be often tracked down to software anomalies of various kinds. In ma...
Heterogeneous mobile, sensor, IoT, smart environment, and social networking applications have recent...
Cloud computing systems provide the facilities to make application services resilient against failur...
Cloud computing systems fail in complex and unexpected ways, due to unexpected combinations of event...
From traditional networking to cloud computing, one of the essential but formidable tasks is to dete...
Operation and maintenance of large distributed cloud applications can quickly become unmanageably co...