Multitier Web System Reliability: Identifying Causative Metrics and Analyzing Performance Anomaly Using a Regression Model

Sensors (Basel). 2023 Feb 8;23(4):1919. doi: 10.3390/s23041919.

Abstract

With the development of the Internet and communication technologies, the types of services provided by multitier Web systems are becoming more diverse and complex compared to those of the past. Ensuring a continuous availability of business services is crucial for multitier Web system providers, as service performance issues immediately affect customer experience and satisfaction. Large companies attempt to monitor the system performance indicator (SPI) that summarizes the status of multitier Web systems to detect performance anomalies at an early stage. However, the current anomaly detection methods are designed to monitor a single specific SPI. Moreover, the existing approaches consider performance anomaly detection and its root cause analysis separately, thereby aggravating the burden of resolving the performance anomaly. To support the system provider in diagnosing the performance anomaly, we propose an advanced causative metric analysis (ACMA) framework. First, we draw out 191 performance metrics (PMs) closely related to the target SPI. Among these PMs, the ACMA determines 62 vital PMs that have the most influence on the variance of the target SPI using several statistical methods. Then, we implement a performance anomaly detection model to identify the causative metrics (CMs) between the vital PMs using random forest regression. Even if the target SPI changes, our detection model does not require any change in its model structure and can derive closely related PMs of the target SPI. Based on our experiments, wherein we applied the ACMA to the business services in an enterprise system, we observed that the proposed ACMA could correctly detect various performance anomalies and their CMs.

Keywords: anomaly detection; causative metrics; distributed systems; regression model.