Why Reliability for Computing Needs Rethinking

2020 International Conference on Rebooting Computing (ICRC) Pub Date : 2020-12-01 DOI:10.1109/ICRC2020.2020.00006

Valeriu Beiu, V. Dragoi, Roxana-Mariana Beiu

{"title":"Why Reliability for Computing Needs Rethinking","authors":"Valeriu Beiu, V. Dragoi, Roxana-Mariana Beiu","doi":"10.1109/ICRC2020.2020.00006","DOIUrl":null,"url":null,"abstract":"Offering high quality services/products has been of paramount importance for both communications and computations. Early on, both of these were in dire need of practical designs for enhancing reliability. That is why John von Neumann proposed the first gate-level method (using redundancy to build reliable systems from unreliable components), while Edward F. Moore and Claude E. Shannon followed suit with the first device-level scheme. Moore and Shannon’s prescient paper also established network reliability as a probabilistic model where the nodes of the network were considered to be perfectly reliable, while the edges could fail independently with a certain probability. The fundamental problem was that of estimating the probability that (under given conditions) two (or more) nodes are connected, the solution being represented by the well-known reliability polynomial (of the network). This concept has been heavily used for communications, where big strides were made and applied to networks of: roads, railways, power lines, fiber optics, phones, sensors, etc. For computations the research community converged on the gate-level method proposed by von Neumann, while the device-level scheme crafted by Moore and Shannon—although very practical and detailed—did not inspire circuit designers and went under the radar. That scheme was built on a thought-provoking network called hammock, exhibiting regular brick-wall near-neighbor connections. Trying to do justice to computing networks in general (and hammocks in particular), this paper aims to highlight and clarify how reliable different types of networks are when they are intended for performing computations. For doing this, we will define quite a few novel cost functions which, together with established ones, will allow us to meticulously compare different types of networks for a clearer understanding of the reliability enhancements they are able to bring to computations. To our knowledge, this is the first ever ranking of networks with respect to computing reliability. The main conclusion is that a rethinking/rebooting of how should we design reliable computing systems, immediately applicable to networks/arrays of devices (e.g., transistors or qubits), is both timely and needed.","PeriodicalId":320580,"journal":{"name":"2020 International Conference on Rebooting Computing (ICRC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Rebooting Computing (ICRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRC2020.2020.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Offering high quality services/products has been of paramount importance for both communications and computations. Early on, both of these were in dire need of practical designs for enhancing reliability. That is why John von Neumann proposed the first gate-level method (using redundancy to build reliable systems from unreliable components), while Edward F. Moore and Claude E. Shannon followed suit with the first device-level scheme. Moore and Shannon’s prescient paper also established network reliability as a probabilistic model where the nodes of the network were considered to be perfectly reliable, while the edges could fail independently with a certain probability. The fundamental problem was that of estimating the probability that (under given conditions) two (or more) nodes are connected, the solution being represented by the well-known reliability polynomial (of the network). This concept has been heavily used for communications, where big strides were made and applied to networks of: roads, railways, power lines, fiber optics, phones, sensors, etc. For computations the research community converged on the gate-level method proposed by von Neumann, while the device-level scheme crafted by Moore and Shannon—although very practical and detailed—did not inspire circuit designers and went under the radar. That scheme was built on a thought-provoking network called hammock, exhibiting regular brick-wall near-neighbor connections. Trying to do justice to computing networks in general (and hammocks in particular), this paper aims to highlight and clarify how reliable different types of networks are when they are intended for performing computations. For doing this, we will define quite a few novel cost functions which, together with established ones, will allow us to meticulously compare different types of networks for a clearer understanding of the reliability enhancements they are able to bring to computations. To our knowledge, this is the first ever ranking of networks with respect to computing reliability. The main conclusion is that a rethinking/rebooting of how should we design reliable computing systems, immediately applicable to networks/arrays of devices (e.g., transistors or qubits), is both timely and needed.

查看原文本刊更多论文

为什么计算的可靠性需要重新思考

提供高质量的服务/产品对于通信和计算都是至关重要的。在早期，这两种技术都迫切需要实用的设计来提高可靠性。这就是为什么约翰·冯·诺伊曼提出了第一个门级方法(利用冗余从不可靠的组件中构建可靠的系统)，而爱德华·f·摩尔和克劳德·e·香农紧随其后，提出了第一个设备级方案。Moore和Shannon的有预见性的论文也将网络可靠性建立为一个概率模型，其中网络的节点被认为是完全可靠的，而边缘可能以一定的概率独立失效。基本问题是估计(在给定条件下)两个(或更多)节点连接的概率，其解由众所周知的(网络的)可靠性多项式表示。这一概念已被广泛用于通信领域，在道路、铁路、电力线、光纤、电话、传感器等网络中取得了长足的进步。在计算方面，研究界倾向于冯·诺伊曼提出的门级方法，而摩尔和香农精心设计的器件级方案——尽管非常实用和详细——并没有激发电路设计师的灵感，也没有得到关注。该方案建立在一个发人深省的网络上，称为吊床，展示了规则的砖墙邻近连接。试图对一般的计算网络(特别是吊床)进行公正的评判，本文旨在强调和阐明不同类型的网络在执行计算时的可靠性。为此，我们将定义一些新的成本函数，这些函数与已建立的成本函数一起，将使我们能够仔细比较不同类型的网络，以便更清楚地了解它们能够为计算带来的可靠性增强。据我们所知，这是有史以来第一次就计算可靠性对网络进行排名。主要结论是，重新思考/重新启动我们应该如何设计可靠的计算系统，立即适用于网络/设备阵列(例如，晶体管或量子位)，既及时又必要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Rebooting Computing (ICRC)

自引率

0.00%

发文量