{"title":"支持自适应容错解决方案的框架","authors":"K. Siozios, D. Soudris, M. Hübner","doi":"10.1145/2629473","DOIUrl":null,"url":null,"abstract":"For decades, computer architects pursued one primary goal: performance. The ever-faster transistors provided by Moore's law were translated into remarkable gains in operation frequency and power consumption. However, the device-level size and architecture complexity impose several new challenges, including a decrease in dependability level due to physical failures. In this article we propose a software-supported methodology based on game theory for adapting the aggressiveness of fault tolerance at runtime. Experimental results prove the efficiency of our solution since it achieves comparable fault masking to relevant solutions, but with significantly lower mitigation cost. More specifically, our framework speeds up the identification of suspicious failure resources on average by 76% as compared to the HotSpot tool. Similarly, the introduced solution leads to average Power×Delay (PDP) savings against an existing TMR approach by 53%.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"377 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Framework for Supporting Adaptive Fault-Tolerant Solutions\",\"authors\":\"K. Siozios, D. Soudris, M. Hübner\",\"doi\":\"10.1145/2629473\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For decades, computer architects pursued one primary goal: performance. The ever-faster transistors provided by Moore's law were translated into remarkable gains in operation frequency and power consumption. However, the device-level size and architecture complexity impose several new challenges, including a decrease in dependability level due to physical failures. In this article we propose a software-supported methodology based on game theory for adapting the aggressiveness of fault tolerance at runtime. Experimental results prove the efficiency of our solution since it achieves comparable fault masking to relevant solutions, but with significantly lower mitigation cost. More specifically, our framework speeds up the identification of suspicious failure resources on average by 76% as compared to the HotSpot tool. Similarly, the introduced solution leads to average Power×Delay (PDP) savings against an existing TMR approach by 53%.\",\"PeriodicalId\":183677,\"journal\":{\"name\":\"ACM Trans. Embed. Comput. Syst.\",\"volume\":\"377 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Embed. Comput. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2629473\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Embed. Comput. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2629473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Framework for Supporting Adaptive Fault-Tolerant Solutions
For decades, computer architects pursued one primary goal: performance. The ever-faster transistors provided by Moore's law were translated into remarkable gains in operation frequency and power consumption. However, the device-level size and architecture complexity impose several new challenges, including a decrease in dependability level due to physical failures. In this article we propose a software-supported methodology based on game theory for adapting the aggressiveness of fault tolerance at runtime. Experimental results prove the efficiency of our solution since it achieves comparable fault masking to relevant solutions, but with significantly lower mitigation cost. More specifically, our framework speeds up the identification of suspicious failure resources on average by 76% as compared to the HotSpot tool. Similarly, the introduced solution leads to average Power×Delay (PDP) savings against an existing TMR approach by 53%.