{"title":"用于自动运行状况检查、早期问题检测和高级问题确定的分析引擎","authors":"Yogendra K. Srivastava, A. Abrashkevich","doi":"10.1109/SRII.2012.70","DOIUrl":null,"url":null,"abstract":"With current trends in software industry toward increased complexity of modern software, tight integration of multiple software products, emphasis on software reliability and high-level availability, software support and maintenance costs increase dramatically. It is imperative for businesses to be able to monitor health of their systems making sure that they are performing at top levels, quickly respond to any problems and timely fix them and also be able to perform advanced problem determination to reduce total time for outages that already occurred. Equally important is to prevent problems from occurring based on best practices and knowledge of known problems/issues for specific software products. To achieve these goals, a powerful analysis engine capable of performing comprehensive health checks of customer systems and advanced problem determination based on analysis of customers' data is proposed. It can be used for both proactive and reactive customer support. Such an engine works as a virtual consultant for the end users. It detects potential problems related to customer systems and installed products and provides notifications or alerts proactively, i.e. could be considered as an early detection system. It is also capable of analyzing FFDC (First Failure Data Capture) data after a problem has occurred, comparing the data with well known problems and related symptoms from relevant knowledge databases and providing customers with the results of analysis, found matches of previously recorded problems and recommendations on how to fix the problem at hand. The engine proposed utilizes up to date analytics from subject matter experts and best practices encoded in it. In the present work, a system architecture and design of such an analysis engine is presented. The proposed engine has a low bar of adoption, flexible extensible design and could be easily adopted for any software product. It is able to analyze encoded human knowledge, compare collected customer data with available historical data and report problems and issues found along with the relevant recommendations and suggested fixes. More specifically, the engine provides a comprehensive analysis in terms of health checks, best practices compliance check, prerequisites check, end-of-service product check, operating environment and configuration setup check, outage prevention, state comparison, problem determination and others. A case study based on the proposed engine design is presented and discussed in more detail.","PeriodicalId":110778,"journal":{"name":"2012 Annual SRII Global Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis Engine for Automated Health Checks, Early Problem Detection and Advanced Problem Determination\",\"authors\":\"Yogendra K. Srivastava, A. Abrashkevich\",\"doi\":\"10.1109/SRII.2012.70\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With current trends in software industry toward increased complexity of modern software, tight integration of multiple software products, emphasis on software reliability and high-level availability, software support and maintenance costs increase dramatically. It is imperative for businesses to be able to monitor health of their systems making sure that they are performing at top levels, quickly respond to any problems and timely fix them and also be able to perform advanced problem determination to reduce total time for outages that already occurred. Equally important is to prevent problems from occurring based on best practices and knowledge of known problems/issues for specific software products. To achieve these goals, a powerful analysis engine capable of performing comprehensive health checks of customer systems and advanced problem determination based on analysis of customers' data is proposed. It can be used for both proactive and reactive customer support. Such an engine works as a virtual consultant for the end users. It detects potential problems related to customer systems and installed products and provides notifications or alerts proactively, i.e. could be considered as an early detection system. It is also capable of analyzing FFDC (First Failure Data Capture) data after a problem has occurred, comparing the data with well known problems and related symptoms from relevant knowledge databases and providing customers with the results of analysis, found matches of previously recorded problems and recommendations on how to fix the problem at hand. The engine proposed utilizes up to date analytics from subject matter experts and best practices encoded in it. In the present work, a system architecture and design of such an analysis engine is presented. The proposed engine has a low bar of adoption, flexible extensible design and could be easily adopted for any software product. It is able to analyze encoded human knowledge, compare collected customer data with available historical data and report problems and issues found along with the relevant recommendations and suggested fixes. More specifically, the engine provides a comprehensive analysis in terms of health checks, best practices compliance check, prerequisites check, end-of-service product check, operating environment and configuration setup check, outage prevention, state comparison, problem determination and others. A case study based on the proposed engine design is presented and discussed in more detail.\",\"PeriodicalId\":110778,\"journal\":{\"name\":\"2012 Annual SRII Global Conference\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Annual SRII Global Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SRII.2012.70\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Annual SRII Global Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRII.2012.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis Engine for Automated Health Checks, Early Problem Detection and Advanced Problem Determination
With current trends in software industry toward increased complexity of modern software, tight integration of multiple software products, emphasis on software reliability and high-level availability, software support and maintenance costs increase dramatically. It is imperative for businesses to be able to monitor health of their systems making sure that they are performing at top levels, quickly respond to any problems and timely fix them and also be able to perform advanced problem determination to reduce total time for outages that already occurred. Equally important is to prevent problems from occurring based on best practices and knowledge of known problems/issues for specific software products. To achieve these goals, a powerful analysis engine capable of performing comprehensive health checks of customer systems and advanced problem determination based on analysis of customers' data is proposed. It can be used for both proactive and reactive customer support. Such an engine works as a virtual consultant for the end users. It detects potential problems related to customer systems and installed products and provides notifications or alerts proactively, i.e. could be considered as an early detection system. It is also capable of analyzing FFDC (First Failure Data Capture) data after a problem has occurred, comparing the data with well known problems and related symptoms from relevant knowledge databases and providing customers with the results of analysis, found matches of previously recorded problems and recommendations on how to fix the problem at hand. The engine proposed utilizes up to date analytics from subject matter experts and best practices encoded in it. In the present work, a system architecture and design of such an analysis engine is presented. The proposed engine has a low bar of adoption, flexible extensible design and could be easily adopted for any software product. It is able to analyze encoded human knowledge, compare collected customer data with available historical data and report problems and issues found along with the relevant recommendations and suggested fixes. More specifically, the engine provides a comprehensive analysis in terms of health checks, best practices compliance check, prerequisites check, end-of-service product check, operating environment and configuration setup check, outage prevention, state comparison, problem determination and others. A case study based on the proposed engine design is presented and discussed in more detail.