Novel Reliability Indicators From the Perspective of Data Center Networks

IF 5.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Hongbin Zhuang;Xiao-Yan Li;Cheng-Kuan Lin;Ximeng Liu;Xiaohua Jia
{"title":"Novel Reliability Indicators From the Perspective of Data Center Networks","authors":"Hongbin Zhuang;Xiao-Yan Li;Cheng-Kuan Lin;Ximeng Liu;Xiaohua Jia","doi":"10.1109/TR.2024.3393133","DOIUrl":null,"url":null,"abstract":"Modern large-scale computing systems always demand better connectivity indicators for reliability evaluation. However, as more processing units have been rapidly incorporated into emerging computing systems, existing indicators (e.g., <inline-formula><tex-math>$\\ell$</tex-math></inline-formula>-component edge connectivity and <inline-formula><tex-math>$\\ell$</tex-math></inline-formula>-extra edge connectivity) have gradually failed to provide the required fault tolerance. In addition, these indicators require, for example, that the faulty network should have at least <inline-formula><tex-math>$\\ell$</tex-math></inline-formula> components (or that each component should have at least <inline-formula><tex-math>$\\ell$</tex-math></inline-formula> nodes). These fault assumptions are not flexible enough to deal with diversified structural demands in practice circumstances. In order to address these challenges simultaneously, this article proposes two novel indicators for network reliability by utilizing the partition matroid technique, named matroidal connectivity and conditional matroidal connectivity. We first investigate the accurate values of (conditional) matroidal connectivity of <inline-formula><tex-math>$k$</tex-math></inline-formula>-ary <inline-formula><tex-math>$n$</tex-math></inline-formula>-cube <inline-formula><tex-math>$Q_{n}^{k}$</tex-math></inline-formula>, which is an appealing option as the underlying topology for modern parallel computing systems. Moreover, we propose an <inline-formula><tex-math>$O(k^{n-1})$</tex-math></inline-formula> algorithm for determining structural features of minimum edge sets whose cardinality is the conditional matroidal connectivity of <inline-formula><tex-math>$Q_{n}^{k}$</tex-math></inline-formula>. Simulation results are presented to verify our algorithm's correctness and further investigate the distribution pattern of edge sets subject to the restriction of partition matroid. We also present comparative analyses illustrating the superior edge fault tolerance of our findings in relation to prior research, which even exhibits an exponential enhancement when <inline-formula><tex-math>$k\\geq 4$</tex-math></inline-formula>.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 1","pages":"2459-2472"},"PeriodicalIF":5.7000,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10527394/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Modern large-scale computing systems always demand better connectivity indicators for reliability evaluation. However, as more processing units have been rapidly incorporated into emerging computing systems, existing indicators (e.g., $\ell$-component edge connectivity and $\ell$-extra edge connectivity) have gradually failed to provide the required fault tolerance. In addition, these indicators require, for example, that the faulty network should have at least $\ell$ components (or that each component should have at least $\ell$ nodes). These fault assumptions are not flexible enough to deal with diversified structural demands in practice circumstances. In order to address these challenges simultaneously, this article proposes two novel indicators for network reliability by utilizing the partition matroid technique, named matroidal connectivity and conditional matroidal connectivity. We first investigate the accurate values of (conditional) matroidal connectivity of $k$-ary $n$-cube $Q_{n}^{k}$, which is an appealing option as the underlying topology for modern parallel computing systems. Moreover, we propose an $O(k^{n-1})$ algorithm for determining structural features of minimum edge sets whose cardinality is the conditional matroidal connectivity of $Q_{n}^{k}$. Simulation results are presented to verify our algorithm's correctness and further investigate the distribution pattern of edge sets subject to the restriction of partition matroid. We also present comparative analyses illustrating the superior edge fault tolerance of our findings in relation to prior research, which even exhibits an exponential enhancement when $k\geq 4$.
从数据中心网络角度看新的可靠性指标
现代大型计算系统在可靠性评估中总是需要更好的连通性指标。然而,随着越来越多的处理单元被迅速整合到新兴的计算系统中,现有的指标(例如,$\ell$ -组件边缘连接和$\ell$ -额外边缘连接)逐渐无法提供所需的容错能力。此外,这些指标还要求,例如,故障网络至少有$\ell$个组件(或者每个组件至少有$\ell$个节点)。这些断层假设不够灵活,无法应对实际环境中多样化的结构需求。为了同时解决这些挑战,本文利用划分矩阵技术提出了两个新的网络可靠性指标,即矩阵连通性和条件矩阵连通性。我们首先研究了$k$ -ary $n$ -cube $Q_{n}^{k}$的(条件)矩阵连通性的准确值,这是现代并行计算系统的基础拓扑的一个有吸引力的选择。此外,我们提出了一种$O(k^{n-1})$算法来确定最小边集的结构特征,其基数为$Q_{n}^{k}$的条件矩阵连通性。仿真结果验证了算法的正确性,并进一步研究了受分割矩阵约束的边集分布规律。我们还提出了比较分析,说明了我们的研究结果与先前的研究相比具有优越的边缘容错能力,当$k\geq 4$时甚至表现出指数增强。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Reliability
IEEE Transactions on Reliability 工程技术-工程:电子与电气
CiteScore
12.20
自引率
8.50%
发文量
153
审稿时长
7.5 months
期刊介绍: IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信