{"title":"嵌入式处理器的可靠性监控——多层视图","authors":"V. Chandra","doi":"10.1145/2593069.2596682","DOIUrl":null,"url":null,"abstract":"Scaling to sub-20nm technology nodes changes the nature of reliability effects from abrupt functional problems to progressive degradation of the performance characteristics of devices and system components. Further, application workloads can significantly affect the overall system reliability. In this work, we have analyzed aging effects on various design hierarchies of an embedded commercial processor in 28nm running real-world applications. We have also quantified the dependencies of aging effects on switching-activity and power-state of workloads. Implementation results show that the processor timing degradation can vary from 2% to 11%, depending on the workload. Due to the dependence of aging on the application workloads, margin based design will be highly pessimistic. We propose an efficient and flexible in situ monitoring methodology, SlackProbe, which inserts timing monitors at both path endpoints and path intermediate nets. We show that SlackProbe reduces the numbers of monitors required by over 15X with ~5% additional delay margin in several commercial processor benchmarks. The real-time data from these monitors can be used for hardware and software adaptation to mitigate failures due to aging.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Monitoring reliability in embedded processors - A multi-layer view\",\"authors\":\"V. Chandra\",\"doi\":\"10.1145/2593069.2596682\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scaling to sub-20nm technology nodes changes the nature of reliability effects from abrupt functional problems to progressive degradation of the performance characteristics of devices and system components. Further, application workloads can significantly affect the overall system reliability. In this work, we have analyzed aging effects on various design hierarchies of an embedded commercial processor in 28nm running real-world applications. We have also quantified the dependencies of aging effects on switching-activity and power-state of workloads. Implementation results show that the processor timing degradation can vary from 2% to 11%, depending on the workload. Due to the dependence of aging on the application workloads, margin based design will be highly pessimistic. We propose an efficient and flexible in situ monitoring methodology, SlackProbe, which inserts timing monitors at both path endpoints and path intermediate nets. We show that SlackProbe reduces the numbers of monitors required by over 15X with ~5% additional delay margin in several commercial processor benchmarks. The real-time data from these monitors can be used for hardware and software adaptation to mitigate failures due to aging.\",\"PeriodicalId\":433816,\"journal\":{\"name\":\"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2593069.2596682\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2593069.2596682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Monitoring reliability in embedded processors - A multi-layer view
Scaling to sub-20nm technology nodes changes the nature of reliability effects from abrupt functional problems to progressive degradation of the performance characteristics of devices and system components. Further, application workloads can significantly affect the overall system reliability. In this work, we have analyzed aging effects on various design hierarchies of an embedded commercial processor in 28nm running real-world applications. We have also quantified the dependencies of aging effects on switching-activity and power-state of workloads. Implementation results show that the processor timing degradation can vary from 2% to 11%, depending on the workload. Due to the dependence of aging on the application workloads, margin based design will be highly pessimistic. We propose an efficient and flexible in situ monitoring methodology, SlackProbe, which inserts timing monitors at both path endpoints and path intermediate nets. We show that SlackProbe reduces the numbers of monitors required by over 15X with ~5% additional delay margin in several commercial processor benchmarks. The real-time data from these monitors can be used for hardware and software adaptation to mitigate failures due to aging.