{"title":"A systematic fault-tolerant computational model for both crash failures and silent data corruption","authors":"Xiaolong Cui, Zaeem Hussain, T. Znati, R. Melhem","doi":"10.1109/ICIN.2018.8401596","DOIUrl":null,"url":null,"abstract":"As the boundaries between Cloud and HPC continue to blur, it is clear that there is an urgent demand for a systematic computational model that adapts to the computing platform and accommodates the underlying workloads. As computing systems continue to scale out to satisfy the increasingly large demands on computing capacity, power awareness and fault tolerance have become major concerns. This paper proposes a novel computational model that applies to both compute- and data-intensive workloads, and deals with diverse types of faults. Evaluation results demonstrate that the proposed model is able to achieve significant energy savings compared to existing fault tolerance techniques, while maintaining the same level of fault tolerance.","PeriodicalId":103076,"journal":{"name":"2018 21st Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIN.2018.8401596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
As the boundaries between Cloud and HPC continue to blur, it is clear that there is an urgent demand for a systematic computational model that adapts to the computing platform and accommodates the underlying workloads. As computing systems continue to scale out to satisfy the increasingly large demands on computing capacity, power awareness and fault tolerance have become major concerns. This paper proposes a novel computational model that applies to both compute- and data-intensive workloads, and deals with diverse types of faults. Evaluation results demonstrate that the proposed model is able to achieve significant energy savings compared to existing fault tolerance techniques, while maintaining the same level of fault tolerance.