Performability Analysis of Mesh-Based NoCs Using Markov Reward Model

Jie Hou, M. Radetzki
{"title":"Performability Analysis of Mesh-Based NoCs Using Markov Reward Model","authors":"Jie Hou, M. Radetzki","doi":"10.1109/PDP2018.2018.00102","DOIUrl":null,"url":null,"abstract":"Technology scaling makes it possible to implement systems with hundreds of processing cores, and thousands in the future. The communication in such systems is enabled by Networks-on-Chips (NoCs). A downside of technology scaling is the increased susceptibility to failures in NoC resources. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. Instead, we suggest treating both aspects together using the concept of performability and its analysis with Markov reward models. Our methodology is exemplified for mesh NoCs and transient faults but can be transferred to other topologies and fault models. We investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performance increase.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP2018.2018.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Technology scaling makes it possible to implement systems with hundreds of processing cores, and thousands in the future. The communication in such systems is enabled by Networks-on-Chips (NoCs). A downside of technology scaling is the increased susceptibility to failures in NoC resources. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. Instead, we suggest treating both aspects together using the concept of performability and its analysis with Markov reward models. Our methodology is exemplified for mesh NoCs and transient faults but can be transferred to other topologies and fault models. We investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performance increase.
基于马尔可夫奖励模型的网格noc性能分析
技术扩展使得实现具有数百个处理核心的系统成为可能,未来可能会有数千个处理核心。这种系统中的通信是由片上网络(noc)实现的。技术扩展的一个缺点是NoC资源对故障的敏感性增加。在此类故障的情况下确保可靠的运行会降低NoC的性能,甚至可能使预期的扩展带来的性能优势失效。因此,像通常那样单独分析性能和可靠性是不够的。相反,我们建议使用可执行性的概念及其与马尔可夫奖励模型的分析来同时处理这两个方面。我们的方法适用于网状noc和瞬态故障,但可以转移到其他拓扑和故障模型。我们研究了性能如何随着扩展到更大的noc而发展,并通过确定盈亏平衡故障率来探索扩展的限制,在这种情况下,扩展可以实现净性能提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信