{"title":"An end-to-end intelligent monitoring system based on pinpoint","authors":"Lanying Shi, Huan Liang, Wensheng Yao, Jingxiang Chen, Chunhua Chen, Yong Chen, Chen-Jei Yang, Mengxia Chen, Yiquan Jiang, Jiangang Tong, Man Li, Hongming Qiao","doi":"10.1109/ICPICS55264.2022.9873689","DOIUrl":null,"url":null,"abstract":"with the wide implementation of distributed architecture, it brings new challenges to operation and maintenance. The number of system nodes and microservices increased exponentially, and the monitoring workload increased sharply. The relationship between monitoring objects is extremely complex, and human maintenance is not competent. The traditional maintenance mode is difficult to sustain due to data fragmentation and remote storage. Traditional operation and maintenance has the following shortcomings: 1) Due to the group / provincial two-level maintenance system, the operation and maintenance is decentralized. As a result, the whole network business support cannot be effectively controlled, and the whole network problem / fault scheduling system is not smooth. 2) The whole network monitoring system is built according to different businesses, with scattered monitoring data and backward monitoring methods, which leads to the difficulty of position problem across businesses. 3) Traditional maintenance is oriented to single system and single business, without focusing on end-to-end customer perception. 4) Single system cross domain or layer problem / fault processing is slow and time-consuming, and can not achieve accurate fault location and rapid fault recovery. This paper proposes an end-to-end intelligent monitoring system based on pinpoint. It is an intensive operation and maintenance platform for cloud systems, which can realize cross domain monitoring and cross IaaS/PaaS/SaaS layer monitoring. It is a shared operation and maintenance platform based on big data and AI technology to establish platform/application architecture. As for service, it can provide end-to-end cross service monitoring throughout the network. As for application, it can be used for the whole network to quickly find and locate faults. After using the end-to-end distributed cloud monitoring system, the fault discovery time is greatly shortened, and the fault handling is reduced from hour level to minute level. At the same time, the system fault time is greatly shortened, and the operation and maintenance efficiency is improved.","PeriodicalId":257180,"journal":{"name":"2022 IEEE 4th International Conference on Power, Intelligent Computing and Systems (ICPICS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th International Conference on Power, Intelligent Computing and Systems (ICPICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPICS55264.2022.9873689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
with the wide implementation of distributed architecture, it brings new challenges to operation and maintenance. The number of system nodes and microservices increased exponentially, and the monitoring workload increased sharply. The relationship between monitoring objects is extremely complex, and human maintenance is not competent. The traditional maintenance mode is difficult to sustain due to data fragmentation and remote storage. Traditional operation and maintenance has the following shortcomings: 1) Due to the group / provincial two-level maintenance system, the operation and maintenance is decentralized. As a result, the whole network business support cannot be effectively controlled, and the whole network problem / fault scheduling system is not smooth. 2) The whole network monitoring system is built according to different businesses, with scattered monitoring data and backward monitoring methods, which leads to the difficulty of position problem across businesses. 3) Traditional maintenance is oriented to single system and single business, without focusing on end-to-end customer perception. 4) Single system cross domain or layer problem / fault processing is slow and time-consuming, and can not achieve accurate fault location and rapid fault recovery. This paper proposes an end-to-end intelligent monitoring system based on pinpoint. It is an intensive operation and maintenance platform for cloud systems, which can realize cross domain monitoring and cross IaaS/PaaS/SaaS layer monitoring. It is a shared operation and maintenance platform based on big data and AI technology to establish platform/application architecture. As for service, it can provide end-to-end cross service monitoring throughout the network. As for application, it can be used for the whole network to quickly find and locate faults. After using the end-to-end distributed cloud monitoring system, the fault discovery time is greatly shortened, and the fault handling is reduced from hour level to minute level. At the same time, the system fault time is greatly shortened, and the operation and maintenance efficiency is improved.