{"title":"在测试延迟约束下的多核系统中基于软件的自测试期间系统可用性的探索","authors":"M. Skitsas, C. Nicopoulos, M. Michael","doi":"10.1109/DFT.2014.6962088","DOIUrl":null,"url":null,"abstract":"As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, we investigate the relation between system test latency and testtime overhead in multi-/many-core systems with shared LastLevel Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system testing session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relation between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for running normal workloads. Under given system test latency constraints, which should be utilized in order to be able to control system recovery time in the event of an error detection, our exploration framework identifies the scheduling policy under which overall test time overhead is minimized and, hence, system availability is maximized. Without any loss of generality, a 16-core system is explored in a full-system, execution-driven simulation framework running multi-threaded PARSEC workloads [1].","PeriodicalId":414665,"journal":{"name":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Exploration of system availability during software-based self-testing in many-core systems under test latency constraints\",\"authors\":\"M. Skitsas, C. Nicopoulos, M. Michael\",\"doi\":\"10.1109/DFT.2014.6962088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, we investigate the relation between system test latency and testtime overhead in multi-/many-core systems with shared LastLevel Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system testing session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relation between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for running normal workloads. Under given system test latency constraints, which should be utilized in order to be able to control system recovery time in the event of an error detection, our exploration framework identifies the scheduling policy under which overall test time overhead is minimized and, hence, system availability is maximized. Without any loss of generality, a 16-core system is explored in a full-system, execution-driven simulation framework running multi-threaded PARSEC workloads [1].\",\"PeriodicalId\":414665,\"journal\":{\"name\":\"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DFT.2014.6962088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2014.6962088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploration of system availability during software-based self-testing in many-core systems under test latency constraints
As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, we investigate the relation between system test latency and testtime overhead in multi-/many-core systems with shared LastLevel Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system testing session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relation between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for running normal workloads. Under given system test latency constraints, which should be utilized in order to be able to control system recovery time in the event of an error detection, our exploration framework identifies the scheduling policy under which overall test time overhead is minimized and, hence, system availability is maximized. Without any loss of generality, a 16-core system is explored in a full-system, execution-driven simulation framework running multi-threaded PARSEC workloads [1].