An end-to-end intelligent monitoring system based on pinpoint

Lanying Shi, Huan Liang, Wensheng Yao, Jingxiang Chen, Chunhua Chen, Yong Chen, Chen-Jei Yang, Mengxia Chen, Yiquan Jiang, Jiangang Tong, Man Li, Hongming Qiao
{"title":"An end-to-end intelligent monitoring system based on pinpoint","authors":"Lanying Shi, Huan Liang, Wensheng Yao, Jingxiang Chen, Chunhua Chen, Yong Chen, Chen-Jei Yang, Mengxia Chen, Yiquan Jiang, Jiangang Tong, Man Li, Hongming Qiao","doi":"10.1109/ICPICS55264.2022.9873689","DOIUrl":null,"url":null,"abstract":"with the wide implementation of distributed architecture, it brings new challenges to operation and maintenance. The number of system nodes and microservices increased exponentially, and the monitoring workload increased sharply. The relationship between monitoring objects is extremely complex, and human maintenance is not competent. The traditional maintenance mode is difficult to sustain due to data fragmentation and remote storage. Traditional operation and maintenance has the following shortcomings: 1) Due to the group / provincial two-level maintenance system, the operation and maintenance is decentralized. As a result, the whole network business support cannot be effectively controlled, and the whole network problem / fault scheduling system is not smooth. 2) The whole network monitoring system is built according to different businesses, with scattered monitoring data and backward monitoring methods, which leads to the difficulty of position problem across businesses. 3) Traditional maintenance is oriented to single system and single business, without focusing on end-to-end customer perception. 4) Single system cross domain or layer problem / fault processing is slow and time-consuming, and can not achieve accurate fault location and rapid fault recovery. This paper proposes an end-to-end intelligent monitoring system based on pinpoint. It is an intensive operation and maintenance platform for cloud systems, which can realize cross domain monitoring and cross IaaS/PaaS/SaaS layer monitoring. It is a shared operation and maintenance platform based on big data and AI technology to establish platform/application architecture. As for service, it can provide end-to-end cross service monitoring throughout the network. As for application, it can be used for the whole network to quickly find and locate faults. After using the end-to-end distributed cloud monitoring system, the fault discovery time is greatly shortened, and the fault handling is reduced from hour level to minute level. At the same time, the system fault time is greatly shortened, and the operation and maintenance efficiency is improved.","PeriodicalId":257180,"journal":{"name":"2022 IEEE 4th International Conference on Power, Intelligent Computing and Systems (ICPICS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th International Conference on Power, Intelligent Computing and Systems (ICPICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPICS55264.2022.9873689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

with the wide implementation of distributed architecture, it brings new challenges to operation and maintenance. The number of system nodes and microservices increased exponentially, and the monitoring workload increased sharply. The relationship between monitoring objects is extremely complex, and human maintenance is not competent. The traditional maintenance mode is difficult to sustain due to data fragmentation and remote storage. Traditional operation and maintenance has the following shortcomings: 1) Due to the group / provincial two-level maintenance system, the operation and maintenance is decentralized. As a result, the whole network business support cannot be effectively controlled, and the whole network problem / fault scheduling system is not smooth. 2) The whole network monitoring system is built according to different businesses, with scattered monitoring data and backward monitoring methods, which leads to the difficulty of position problem across businesses. 3) Traditional maintenance is oriented to single system and single business, without focusing on end-to-end customer perception. 4) Single system cross domain or layer problem / fault processing is slow and time-consuming, and can not achieve accurate fault location and rapid fault recovery. This paper proposes an end-to-end intelligent monitoring system based on pinpoint. It is an intensive operation and maintenance platform for cloud systems, which can realize cross domain monitoring and cross IaaS/PaaS/SaaS layer monitoring. It is a shared operation and maintenance platform based on big data and AI technology to establish platform/application architecture. As for service, it can provide end-to-end cross service monitoring throughout the network. As for application, it can be used for the whole network to quickly find and locate faults. After using the end-to-end distributed cloud monitoring system, the fault discovery time is greatly shortened, and the fault handling is reduced from hour level to minute level. At the same time, the system fault time is greatly shortened, and the operation and maintenance efficiency is improved.
基于精确定位的端到端智能监控系统
随着分布式体系结构的广泛实施,它给运维带来了新的挑战。系统节点和微服务数量呈指数级增长,监控工作量急剧增加。监控对象之间的关系极其复杂,人工维护不胜任。传统的维护模式由于数据碎片化、远程存储等原因难以持续。传统运维存在以下缺点:1)由于集团/省两级维护体制,运维分散。导致全网业务支持无法得到有效控制,全网问题/故障调度系统不顺畅。2)全网监控系统是根据不同业务构建的,监控数据分散,监控方式落后,导致跨业务定位困难。3)传统维护面向单一系统、单一业务,不关注端到端的客户感知。4)单系统跨域或跨层问题/故障处理速度慢、耗时长,无法实现准确的故障定位和快速的故障恢复。提出了一种基于精确定位的端到端智能监控系统。是云系统的集约化运维平台,可实现跨域监控和跨IaaS/PaaS/SaaS层监控。是基于大数据和人工智能技术建立平台/应用架构的共享运维平台。在业务方面,可以在整个网络中提供端到端的跨业务监控。在应用方面,可用于全网快速发现和定位故障。采用端到端分布式云监控系统后,故障发现时间大大缩短,故障处理时间从小时级降低到分钟级。同时大大缩短了系统故障时间,提高了运维效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信