H. Okita, Hayato Hoshihara, N. Komoda, T. Fujiwara
{"title":"DYNAMICALLY PRIORITIZED FAILURE MANAGEMENT ACCORDING TO RELIABILITY MODEL IN LARGE-SCALE DATA CENTER","authors":"H. Okita, Hayato Hoshihara, N. Komoda, T. Fujiwara","doi":"10.33965/ijwi_202018208","DOIUrl":null,"url":null,"abstract":"We propose a dynamically prioritized failure management method according to the reliability model that the failure rate of virtual machine varies in its life cycle. When using a combination of server monitoring with ping and network connection check with Ethernet OAM, the system sets higher priorities to the port connected to a long running server, and the port within a certain time after the connection change or virtual machine addition is set. The system then selects the ports from the higher priority port to be monitored by Ethernet OAM. As a result of the evaluation by the simulation, by dynamically selecting the port to be monitored for Ethernet OAM using the proposed method, it was confirmed that more than a third of all failures were detected with Maintenance End Points which number is only a tenth of that of servers in a data center. In the data center for cloud services running many VMs, it is possible to shorten the recovery from VM failure while suppressing the number of objects monitored by Ethernet OAM by using this method.","PeriodicalId":245560,"journal":{"name":"IADIS INTERNATIONAL JOURNAL ON WWW/INTERNET","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IADIS INTERNATIONAL JOURNAL ON WWW/INTERNET","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33965/ijwi_202018208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a dynamically prioritized failure management method according to the reliability model that the failure rate of virtual machine varies in its life cycle. When using a combination of server monitoring with ping and network connection check with Ethernet OAM, the system sets higher priorities to the port connected to a long running server, and the port within a certain time after the connection change or virtual machine addition is set. The system then selects the ports from the higher priority port to be monitored by Ethernet OAM. As a result of the evaluation by the simulation, by dynamically selecting the port to be monitored for Ethernet OAM using the proposed method, it was confirmed that more than a third of all failures were detected with Maintenance End Points which number is only a tenth of that of servers in a data center. In the data center for cloud services running many VMs, it is possible to shorten the recovery from VM failure while suppressing the number of objects monitored by Ethernet OAM by using this method.