Lanyue Lu, P. Sarkar, Dinesh Subhraveti, S. Sarkar, Mark Seaman, Reshu Jain, Ahmed Bashir
{"title":"CARP:在集成程序和存储复制机制中处理静默数据错误和站点故障","authors":"Lanyue Lu, P. Sarkar, Dinesh Subhraveti, S. Sarkar, Mark Seaman, Reshu Jain, Ahmed Bashir","doi":"10.1109/ICDCS.2009.58","DOIUrl":null,"url":null,"abstract":"This paper presents CARP, an integrated program and storage replication solution. CARP extends program replication systems which do not currently address storage errors, builds upon a record-and-replay scheme that handles nondeterminism in program execution, and uses a scheme based on recorded program state and I/O logs to enable efficient detection of silent data errors and efficient recovery from such errors. CARP is designed to be transparent to applications with minimal run-time impact and is general enough to be implemented on commodity machines. We implemented CARP as a prototype on the Linux operating system and conducted extensive sensitivity analysis of its overhead with different application profiles and system parameters. In particular, we evaluated CARP with standard unmodified email, database, and web server benchmarks and showed that it imposes acceptable overhead while providing sub-second program state recovery times on detecting a silent data error.","PeriodicalId":387968,"journal":{"name":"2009 29th IEEE International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2009-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"CARP: Handling Silent Data Errors and Site Failures in an Integrated Program and Storage Replication Mechanism\",\"authors\":\"Lanyue Lu, P. Sarkar, Dinesh Subhraveti, S. Sarkar, Mark Seaman, Reshu Jain, Ahmed Bashir\",\"doi\":\"10.1109/ICDCS.2009.58\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents CARP, an integrated program and storage replication solution. CARP extends program replication systems which do not currently address storage errors, builds upon a record-and-replay scheme that handles nondeterminism in program execution, and uses a scheme based on recorded program state and I/O logs to enable efficient detection of silent data errors and efficient recovery from such errors. CARP is designed to be transparent to applications with minimal run-time impact and is general enough to be implemented on commodity machines. We implemented CARP as a prototype on the Linux operating system and conducted extensive sensitivity analysis of its overhead with different application profiles and system parameters. In particular, we evaluated CARP with standard unmodified email, database, and web server benchmarks and showed that it imposes acceptable overhead while providing sub-second program state recovery times on detecting a silent data error.\",\"PeriodicalId\":387968,\"journal\":{\"name\":\"2009 29th IEEE International Conference on Distributed Computing Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 29th IEEE International Conference on Distributed Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS.2009.58\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 29th IEEE International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2009.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CARP: Handling Silent Data Errors and Site Failures in an Integrated Program and Storage Replication Mechanism
This paper presents CARP, an integrated program and storage replication solution. CARP extends program replication systems which do not currently address storage errors, builds upon a record-and-replay scheme that handles nondeterminism in program execution, and uses a scheme based on recorded program state and I/O logs to enable efficient detection of silent data errors and efficient recovery from such errors. CARP is designed to be transparent to applications with minimal run-time impact and is general enough to be implemented on commodity machines. We implemented CARP as a prototype on the Linux operating system and conducted extensive sensitivity analysis of its overhead with different application profiles and system parameters. In particular, we evaluated CARP with standard unmodified email, database, and web server benchmarks and showed that it imposes acceptable overhead while providing sub-second program state recovery times on detecting a silent data error.