An Efficient Soft Error Detection in Multicore Processors Running Server Applications

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-02-01 DOI:10.1109/PDP.2016.100

A. Tajary, H. Zarandi

{"title":"An Efficient Soft Error Detection in Multicore Processors Running Server Applications","authors":"A. Tajary, H. Zarandi","doi":"10.1109/PDP.2016.100","DOIUrl":null,"url":null,"abstract":"In this paper, a throughput-aware transient fault detection method is presented with respect to the features of server processors. The proposed method takes the advantages of combination of reconfigurable redundant execution-based fault detection and speculative fault detection. The reconfigurable redundant execution-based fault detection method by using configuration manager module couples two free adjacent cores on which a thread will be executed, and decouples them when resources are limited for normal execution. This method exploits unused resources in the multi-core processors to ensure high throughput reliable execution. The speculative fault detection method uses a history of block addresses requested form L1 cache to L2 cache during thread execution to find abnormal execution behavior. In order to evaluate the proposed method, the alpha processor model is utilized in the context of Gem5 simulator. The experimental results showed that 70% of injected faults can be detected with negligible hardware overhead.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, a throughput-aware transient fault detection method is presented with respect to the features of server processors. The proposed method takes the advantages of combination of reconfigurable redundant execution-based fault detection and speculative fault detection. The reconfigurable redundant execution-based fault detection method by using configuration manager module couples two free adjacent cores on which a thread will be executed, and decouples them when resources are limited for normal execution. This method exploits unused resources in the multi-core processors to ensure high throughput reliable execution. The speculative fault detection method uses a history of block addresses requested form L1 cache to L2 cache during thread execution to find abnormal execution behavior. In order to evaluate the proposed method, the alpha processor model is utilized in the context of Gem5 simulator. The experimental results showed that 70% of injected faults can be detected with negligible hardware overhead.

查看原文本刊更多论文

多核处理器运行服务器应用程序的有效软错误检测

本文针对服务器处理器的特点，提出了一种吞吐量感知的暂态故障检测方法。该方法将基于可重构冗余执行的故障检测与推测性故障检测相结合。基于可重构冗余执行的故障检测方法利用配置管理器模块对两个空闲的相邻核进行耦合，并在资源有限的情况下进行解耦。该方法利用多核处理器中未使用的资源来保证高吞吐量和可靠的执行。推测性故障检测方法使用线程执行期间从L1缓存到L2缓存请求的块地址历史记录来查找异常的执行行为。为了对所提出的方法进行评估，在Gem5仿真环境中使用了alpha处理器模型。实验结果表明，70%的注入故障可以被检测到，而硬件开销可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

自引率

0.00%

发文量