Alberto Avritzer;Andrea Janes;Andrea Marin;Catia Trubiani;Andre van Hoorn;Matteo Camilli;Daniel S. Menasché;André B. Bondi
{"title":"Software Aging Detection and Rejuvenation Assessment in Heterogeneous Virtual Networks","authors":"Alberto Avritzer;Andrea Janes;Andrea Marin;Catia Trubiani;Andre van Hoorn;Matteo Camilli;Daniel S. Menasché;André B. Bondi","doi":"10.1109/TETC.2025.3547612","DOIUrl":null,"url":null,"abstract":"In this article, we report on the application of resiliency enforcement strategies that were applied to a microservices system running on a real-world deployment of a large cluster of heterogeneous Virtual Machines (VMs). We present the evaluation results obtained from measurement and modeling implementations. The measurement infrastructure was composed of 15 large and 15 extra-large VMs. The modeling approach used Markov Decision Processes (MDP). On the measurement testbed, we implemented three different levels of software rejuvenation granularity to achieve software resiliency. We have discovered two threats to resiliency in this environment. The first threat to resiliency was a memory leak that was part of the underlying open-source infrastructure in each VM. The second threat to resiliency was the result of the contention for resources in the physical host, which is dependent on the number and size of VMs deployed to the physical host. In the MDP modeling approach, we evaluated four strategies for assigning tasks to VMs with different configurations and different levels of parallelism. Using the large cluster under study, we compared our approach of using software aging and rejuvenation with the state-of-the-art approach of using a network of VMs deployed to a private cloud without software aging detection and rejuvenation. In summary, we show that in a private cloud with non-elastic resource allocation in the physical hosts, careful performance engineering needs to be performed to optimize the trade-offs between the number of VMs allocated and the total memory allocated to each VM.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"299-313"},"PeriodicalIF":5.4000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10923615","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10923615/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this article, we report on the application of resiliency enforcement strategies that were applied to a microservices system running on a real-world deployment of a large cluster of heterogeneous Virtual Machines (VMs). We present the evaluation results obtained from measurement and modeling implementations. The measurement infrastructure was composed of 15 large and 15 extra-large VMs. The modeling approach used Markov Decision Processes (MDP). On the measurement testbed, we implemented three different levels of software rejuvenation granularity to achieve software resiliency. We have discovered two threats to resiliency in this environment. The first threat to resiliency was a memory leak that was part of the underlying open-source infrastructure in each VM. The second threat to resiliency was the result of the contention for resources in the physical host, which is dependent on the number and size of VMs deployed to the physical host. In the MDP modeling approach, we evaluated four strategies for assigning tasks to VMs with different configurations and different levels of parallelism. Using the large cluster under study, we compared our approach of using software aging and rejuvenation with the state-of-the-art approach of using a network of VMs deployed to a private cloud without software aging detection and rejuvenation. In summary, we show that in a private cloud with non-elastic resource allocation in the physical hosts, careful performance engineering needs to be performed to optimize the trade-offs between the number of VMs allocated and the total memory allocated to each VM.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.