{"title":"高性能计算中防止沉默数据损坏的双源验证","authors":"Era Ajdaraga Krluku, M. Gusev, Vladimir Zdraveski","doi":"10.1145/3351556.3351567","DOIUrl":null,"url":null,"abstract":"This paper proposes a continuous health-check approach for detecting Silent Data Corruption (SCD) in High Performance Computing (HPC) environments. The goal is to minimize the effect of hardware errors in the overall reliability and accuracy of the system by overseeing and validating the accuracy of data. Our work focuses on comparing and presenting the advantages and shortcomings of two approaches to overcoming SDC. Our research shows that from the two proposed methods - threshold triggered and continuous verification - the latter is superior in terms of latency.","PeriodicalId":126836,"journal":{"name":"Proceedings of the 9th Balkan Conference on Informatics","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Bi-Source Verification Against Silent Data Corruption in High Performance Computing\",\"authors\":\"Era Ajdaraga Krluku, M. Gusev, Vladimir Zdraveski\",\"doi\":\"10.1145/3351556.3351567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a continuous health-check approach for detecting Silent Data Corruption (SCD) in High Performance Computing (HPC) environments. The goal is to minimize the effect of hardware errors in the overall reliability and accuracy of the system by overseeing and validating the accuracy of data. Our work focuses on comparing and presenting the advantages and shortcomings of two approaches to overcoming SDC. Our research shows that from the two proposed methods - threshold triggered and continuous verification - the latter is superior in terms of latency.\",\"PeriodicalId\":126836,\"journal\":{\"name\":\"Proceedings of the 9th Balkan Conference on Informatics\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th Balkan Conference on Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3351556.3351567\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Balkan Conference on Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3351556.3351567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bi-Source Verification Against Silent Data Corruption in High Performance Computing
This paper proposes a continuous health-check approach for detecting Silent Data Corruption (SCD) in High Performance Computing (HPC) environments. The goal is to minimize the effect of hardware errors in the overall reliability and accuracy of the system by overseeing and validating the accuracy of data. Our work focuses on comparing and presenting the advantages and shortcomings of two approaches to overcoming SDC. Our research shows that from the two proposed methods - threshold triggered and continuous verification - the latter is superior in terms of latency.