Performability Analysis of Mesh-Based NoCs Using Markov Reward Model

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2018-03-21 DOI:10.1109/PDP2018.2018.00102

Jie Hou, M. Radetzki

引用次数: 2

Abstract

Technology scaling makes it possible to implement systems with hundreds of processing cores, and thousands in the future. The communication in such systems is enabled by Networks-on-Chips (NoCs). A downside of technology scaling is the increased susceptibility to failures in NoC resources. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. Instead, we suggest treating both aspects together using the concept of performability and its analysis with Markov reward models. Our methodology is exemplified for mesh NoCs and transient faults but can be transferred to other topologies and fault models. We investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performance increase.

查看原文本刊更多论文

基于马尔可夫奖励模型的网格noc性能分析

技术扩展使得实现具有数百个处理核心的系统成为可能，未来可能会有数千个处理核心。这种系统中的通信是由片上网络(noc)实现的。技术扩展的一个缺点是NoC资源对故障的敏感性增加。在此类故障的情况下确保可靠的运行会降低NoC的性能，甚至可能使预期的扩展带来的性能优势失效。因此，像通常那样单独分析性能和可靠性是不够的。相反，我们建议使用可执行性的概念及其与马尔可夫奖励模型的分析来同时处理这两个方面。我们的方法适用于网状noc和瞬态故障，但可以转移到其他拓扑和故障模型。我们研究了性能如何随着扩展到更大的noc而发展，并通过确定盈亏平衡故障率来探索扩展的限制，在这种情况下，扩展可以实现净性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量