{"title":"Workload-Cognizant Impact Analysis and its Applications in Error Detection and Tolerance in Modern Microprocessors","authors":"Y. Makris","doi":"10.1109/DFT.2009.64","DOIUrl":null,"url":null,"abstract":"The objective of the research presented in this talk is to investigate the relative importance of errors in a modern microprocessor based on the impact that they incur on the execution of typical workload. Such information can prove immensely useful in allocating resources to enhance on-line testability and error resilience through concurrent error detection/correction methods. Indeed, modern microprocessors exhibit an inherent effectiveness in suppressing a significant percentage of errors and preventing them from interfering with correct program execution (i.e. application-level masking). Therefore, understanding and leveraging the correlation between low-level errors and their instruction-level impact is crucial towards developing cost-effective mitigation methods. To this end, I will first report on an extensive fault simulation infrastructure that we developed around a superscalar, dynamicallyscheduled, out-of-order, Alpha-like microprocessor, which supports execution of SPEC2000 integer benchmarks and enables the aforementioned correlation study. Then, I will demonstrate the utility of this information in developing cost-effective concurrent error detection and soft error mitigation methods for modern microprocessors. Finally, I will discuss the application of workload-cognizant impact analysis in identifying and dealing with faults that do not affect functional correctness but simply slow down program execution in modern microprocessors (i.e. performance faults).","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"303 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2009.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The objective of the research presented in this talk is to investigate the relative importance of errors in a modern microprocessor based on the impact that they incur on the execution of typical workload. Such information can prove immensely useful in allocating resources to enhance on-line testability and error resilience through concurrent error detection/correction methods. Indeed, modern microprocessors exhibit an inherent effectiveness in suppressing a significant percentage of errors and preventing them from interfering with correct program execution (i.e. application-level masking). Therefore, understanding and leveraging the correlation between low-level errors and their instruction-level impact is crucial towards developing cost-effective mitigation methods. To this end, I will first report on an extensive fault simulation infrastructure that we developed around a superscalar, dynamicallyscheduled, out-of-order, Alpha-like microprocessor, which supports execution of SPEC2000 integer benchmarks and enables the aforementioned correlation study. Then, I will demonstrate the utility of this information in developing cost-effective concurrent error detection and soft error mitigation methods for modern microprocessors. Finally, I will discuss the application of workload-cognizant impact analysis in identifying and dealing with faults that do not affect functional correctness but simply slow down program execution in modern microprocessors (i.e. performance faults).