{"title":"负载认知影响分析及其在现代微处理器错误检测和容错中的应用","authors":"Y. Makris","doi":"10.1109/DFT.2009.64","DOIUrl":null,"url":null,"abstract":"The objective of the research presented in this talk is to investigate the relative importance of errors in a modern microprocessor based on the impact that they incur on the execution of typical workload. Such information can prove immensely useful in allocating resources to enhance on-line testability and error resilience through concurrent error detection/correction methods. Indeed, modern microprocessors exhibit an inherent effectiveness in suppressing a significant percentage of errors and preventing them from interfering with correct program execution (i.e. application-level masking). Therefore, understanding and leveraging the correlation between low-level errors and their instruction-level impact is crucial towards developing cost-effective mitigation methods. To this end, I will first report on an extensive fault simulation infrastructure that we developed around a superscalar, dynamicallyscheduled, out-of-order, Alpha-like microprocessor, which supports execution of SPEC2000 integer benchmarks and enables the aforementioned correlation study. Then, I will demonstrate the utility of this information in developing cost-effective concurrent error detection and soft error mitigation methods for modern microprocessors. Finally, I will discuss the application of workload-cognizant impact analysis in identifying and dealing with faults that do not affect functional correctness but simply slow down program execution in modern microprocessors (i.e. performance faults).","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"303 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Workload-Cognizant Impact Analysis and its Applications in Error Detection and Tolerance in Modern Microprocessors\",\"authors\":\"Y. Makris\",\"doi\":\"10.1109/DFT.2009.64\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of the research presented in this talk is to investigate the relative importance of errors in a modern microprocessor based on the impact that they incur on the execution of typical workload. Such information can prove immensely useful in allocating resources to enhance on-line testability and error resilience through concurrent error detection/correction methods. Indeed, modern microprocessors exhibit an inherent effectiveness in suppressing a significant percentage of errors and preventing them from interfering with correct program execution (i.e. application-level masking). Therefore, understanding and leveraging the correlation between low-level errors and their instruction-level impact is crucial towards developing cost-effective mitigation methods. To this end, I will first report on an extensive fault simulation infrastructure that we developed around a superscalar, dynamicallyscheduled, out-of-order, Alpha-like microprocessor, which supports execution of SPEC2000 integer benchmarks and enables the aforementioned correlation study. Then, I will demonstrate the utility of this information in developing cost-effective concurrent error detection and soft error mitigation methods for modern microprocessors. Finally, I will discuss the application of workload-cognizant impact analysis in identifying and dealing with faults that do not affect functional correctness but simply slow down program execution in modern microprocessors (i.e. performance faults).\",\"PeriodicalId\":405651,\"journal\":{\"name\":\"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems\",\"volume\":\"303 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DFT.2009.64\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2009.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Workload-Cognizant Impact Analysis and its Applications in Error Detection and Tolerance in Modern Microprocessors
The objective of the research presented in this talk is to investigate the relative importance of errors in a modern microprocessor based on the impact that they incur on the execution of typical workload. Such information can prove immensely useful in allocating resources to enhance on-line testability and error resilience through concurrent error detection/correction methods. Indeed, modern microprocessors exhibit an inherent effectiveness in suppressing a significant percentage of errors and preventing them from interfering with correct program execution (i.e. application-level masking). Therefore, understanding and leveraging the correlation between low-level errors and their instruction-level impact is crucial towards developing cost-effective mitigation methods. To this end, I will first report on an extensive fault simulation infrastructure that we developed around a superscalar, dynamicallyscheduled, out-of-order, Alpha-like microprocessor, which supports execution of SPEC2000 integer benchmarks and enables the aforementioned correlation study. Then, I will demonstrate the utility of this information in developing cost-effective concurrent error detection and soft error mitigation methods for modern microprocessors. Finally, I will discuss the application of workload-cognizant impact analysis in identifying and dealing with faults that do not affect functional correctness but simply slow down program execution in modern microprocessors (i.e. performance faults).