N. Saxena, Chien Chen, R. Swami, H. Osone, Shalesh Thusoo, D. Lyon, D. Chang, Anand Dharmaraj, N. Patkar, Yizhi Lu, Ben-Hau Chia
{"title":"超标量推测乱序执行处理器系统中的错误检测和处理","authors":"N. Saxena, Chien Chen, R. Swami, H. Osone, Shalesh Thusoo, D. Lyon, D. Chang, Anand Dharmaraj, N. Patkar, Yizhi Lu, Ben-Hau Chia","doi":"10.1109/FTCS.1995.466952","DOIUrl":null,"url":null,"abstract":"The HaL SPARC64 Processor, the first 64-bit SPARC-V9 architecture implementation, uses several techniques to ensure a high degree of system reliability, error detection, and error recovery. The CPU of the multi-chip module processor has a superscalar, speculative issue unit, and an out-of-order execution datapath. These two processor components complicate the maintenance of precise state in the event of errors. By exploiting the SPARC-V9 architectural features, and the micro-architecture for speculative execution, SPARC64 maintains precise state in the event of exceptions and errors, logs and reports errors, and facilitates error detection during full system bringup. The paper presents details of error detection and handling in the CPU, the cache system, and the Memory Management Unit(MMU). The HaL R1 system also implements a fault-secure memory system design. The memory system corrects all single-bit errors, detects double bit errors, detects single address line failures, and detects all single dynamic RAM (DRAM) chip failures. Certain debug features have been added to the system that are useful during system bring-up.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"213 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Error detection and handling in a superscalar, speculative out-of-order execution processor system\",\"authors\":\"N. Saxena, Chien Chen, R. Swami, H. Osone, Shalesh Thusoo, D. Lyon, D. Chang, Anand Dharmaraj, N. Patkar, Yizhi Lu, Ben-Hau Chia\",\"doi\":\"10.1109/FTCS.1995.466952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The HaL SPARC64 Processor, the first 64-bit SPARC-V9 architecture implementation, uses several techniques to ensure a high degree of system reliability, error detection, and error recovery. The CPU of the multi-chip module processor has a superscalar, speculative issue unit, and an out-of-order execution datapath. These two processor components complicate the maintenance of precise state in the event of errors. By exploiting the SPARC-V9 architectural features, and the micro-architecture for speculative execution, SPARC64 maintains precise state in the event of exceptions and errors, logs and reports errors, and facilitates error detection during full system bringup. The paper presents details of error detection and handling in the CPU, the cache system, and the Memory Management Unit(MMU). The HaL R1 system also implements a fault-secure memory system design. The memory system corrects all single-bit errors, detects double bit errors, detects single address line failures, and detects all single dynamic RAM (DRAM) chip failures. Certain debug features have been added to the system that are useful during system bring-up.<<ETX>>\",\"PeriodicalId\":309075,\"journal\":{\"name\":\"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers\",\"volume\":\"213 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FTCS.1995.466952\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTCS.1995.466952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
摘要
HaL SPARC64处理器是第一个64位SPARC-V9架构实现,它使用了几种技术来确保高度的系统可靠性、错误检测和错误恢复。多片模块处理器的CPU具有超标量、推测问题单元和乱序执行数据路径。这两个处理器组件使在发生错误时精确状态的维护复杂化。通过利用SPARC-V9体系结构特性和用于推测执行的微体系结构,SPARC64在发生异常和错误时保持精确的状态,记录和报告错误,并在整个系统启动期间促进错误检测。本文详细介绍了CPU、缓存系统和内存管理单元(MMU)的错误检测和处理。HaL R1系统还实现了故障安全存储系统设计。内存系统可以纠正所有的单比特错误,检测双比特错误,检测单地址线故障,以及检测所有的单动态RAM (DRAM)芯片故障。某些调试功能已添加到系统中,这些功能在系统启动期间很有用。
Error detection and handling in a superscalar, speculative out-of-order execution processor system
The HaL SPARC64 Processor, the first 64-bit SPARC-V9 architecture implementation, uses several techniques to ensure a high degree of system reliability, error detection, and error recovery. The CPU of the multi-chip module processor has a superscalar, speculative issue unit, and an out-of-order execution datapath. These two processor components complicate the maintenance of precise state in the event of errors. By exploiting the SPARC-V9 architectural features, and the micro-architecture for speculative execution, SPARC64 maintains precise state in the event of exceptions and errors, logs and reports errors, and facilitates error detection during full system bringup. The paper presents details of error detection and handling in the CPU, the cache system, and the Memory Management Unit(MMU). The HaL R1 system also implements a fault-secure memory system design. The memory system corrects all single-bit errors, detects double bit errors, detects single address line failures, and detects all single dynamic RAM (DRAM) chip failures. Certain debug features have been added to the system that are useful during system bring-up.<>