{"title":"主动容忍Reed-Solomon存储系统的数据丢失模型","authors":"Jing Li, Zhenrui Zhou, Jianli Ding","doi":"10.1016/j.future.2025.107832","DOIUrl":null,"url":null,"abstract":"<div><div>Proactive fault tolerance increasingly serves as an added protection for data in Reed–Solomon (RS) systems. Compared with declustered placement, grouped placement reduces the failure units and also decreases the repair parallelism, which have the opposite effect on systems reliability. For a RS (<span><math><mi>k</mi></math></span>, <span><math><mi>m</mi></math></span>) system, the values of (<span><math><mi>k</mi></math></span>, <span><math><mi>m</mi></math></span>) impact storage overhead, fault tolerance and repair traffic. When designing proactive RS storage systems, it is challenging to choose the proper placement scheme and coding scheme.</div><div>This paper presents four general reliability equations for estimating the number of data-loss events and the amount of data loss in proactive RS systems using declustered and grouped placement schemes. These equations model the effect of disk/node failures, repair bandwidth, block errors, disk scrubbing, disk/node failure prediction, stripe placement, and coding scheme on the reliability of systems. Moreover, we design a Monte-Carlo based simulator to analyze the reliability of proactive Reed–Solomon systems. The equational results are in good accord with the simulation results, which demonstrates the effectiveness of our proposed equations. Using these mathematical models, we can easily estimate and compare fault tolerant schemes and placement schemes, learn the effect of system parameters on system reliability, facilitating to maintain and design cloud storage systems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"170 ","pages":"Article 107832"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-loss models for proactive-tolerance Reed–Solomon storage systems\",\"authors\":\"Jing Li, Zhenrui Zhou, Jianli Ding\",\"doi\":\"10.1016/j.future.2025.107832\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Proactive fault tolerance increasingly serves as an added protection for data in Reed–Solomon (RS) systems. Compared with declustered placement, grouped placement reduces the failure units and also decreases the repair parallelism, which have the opposite effect on systems reliability. For a RS (<span><math><mi>k</mi></math></span>, <span><math><mi>m</mi></math></span>) system, the values of (<span><math><mi>k</mi></math></span>, <span><math><mi>m</mi></math></span>) impact storage overhead, fault tolerance and repair traffic. When designing proactive RS storage systems, it is challenging to choose the proper placement scheme and coding scheme.</div><div>This paper presents four general reliability equations for estimating the number of data-loss events and the amount of data loss in proactive RS systems using declustered and grouped placement schemes. These equations model the effect of disk/node failures, repair bandwidth, block errors, disk scrubbing, disk/node failure prediction, stripe placement, and coding scheme on the reliability of systems. Moreover, we design a Monte-Carlo based simulator to analyze the reliability of proactive Reed–Solomon systems. The equational results are in good accord with the simulation results, which demonstrates the effectiveness of our proposed equations. Using these mathematical models, we can easily estimate and compare fault tolerant schemes and placement schemes, learn the effect of system parameters on system reliability, facilitating to maintain and design cloud storage systems.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"170 \",\"pages\":\"Article 107832\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X2500127X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X2500127X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Data-loss models for proactive-tolerance Reed–Solomon storage systems
Proactive fault tolerance increasingly serves as an added protection for data in Reed–Solomon (RS) systems. Compared with declustered placement, grouped placement reduces the failure units and also decreases the repair parallelism, which have the opposite effect on systems reliability. For a RS (, ) system, the values of (, ) impact storage overhead, fault tolerance and repair traffic. When designing proactive RS storage systems, it is challenging to choose the proper placement scheme and coding scheme.
This paper presents four general reliability equations for estimating the number of data-loss events and the amount of data loss in proactive RS systems using declustered and grouped placement schemes. These equations model the effect of disk/node failures, repair bandwidth, block errors, disk scrubbing, disk/node failure prediction, stripe placement, and coding scheme on the reliability of systems. Moreover, we design a Monte-Carlo based simulator to analyze the reliability of proactive Reed–Solomon systems. The equational results are in good accord with the simulation results, which demonstrates the effectiveness of our proposed equations. Using these mathematical models, we can easily estimate and compare fault tolerant schemes and placement schemes, learn the effect of system parameters on system reliability, facilitating to maintain and design cloud storage systems.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.