{"title":"基于路径的多处理器系统拓扑不可知故障诊断策略","authors":"Lin Chen;Hao Feng;Jiong Wu","doi":"10.1109/TC.2025.3543701","DOIUrl":null,"url":null,"abstract":"Fault diagnosis technology is a method for locating faulty processors in multiprocessor systems, and it plays a crucial role in ensuring system stability, security and reliability. A widely used approach in this technology is the system-level strategy, which determines processor status by interpreting the set of test results between adjacent processors. Among them, the <i>PMC</i> and <i>MM</i> models are two commonly employed methods for generating these results. The diversity and complexity of network topologies in systems constrain existing algorithms to specific topologies, while the limitations of fault diagnosis strategies lead to reduced fault tolerance. In this paper, we present a novel path-based method to tackle the fault diagnosis problems in various networks according to the PMC and MM models. Firstly, we introduce the algorithm for partitioning the path into subpaths based on these models. To ensure that at least one subpath is diagnosed as fault-free, we derive the relationship between the fault bound <inline-formula><tex-math>$T$</tex-math></inline-formula> and the path length <inline-formula><tex-math>$N$</tex-math></inline-formula>. Then, building on methods for recognizing the subpath states, we have developed fault diagnosis algorithms for both the PMC and MM models. The simulation results show that our proposed algorithms can quickly and accurately diagnose faults in multiprocessor systems.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"1886-1896"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Path-Based Topology-Agnostic Fault Diagnosis Strategy for Multiprocessor Systems\",\"authors\":\"Lin Chen;Hao Feng;Jiong Wu\",\"doi\":\"10.1109/TC.2025.3543701\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fault diagnosis technology is a method for locating faulty processors in multiprocessor systems, and it plays a crucial role in ensuring system stability, security and reliability. A widely used approach in this technology is the system-level strategy, which determines processor status by interpreting the set of test results between adjacent processors. Among them, the <i>PMC</i> and <i>MM</i> models are two commonly employed methods for generating these results. The diversity and complexity of network topologies in systems constrain existing algorithms to specific topologies, while the limitations of fault diagnosis strategies lead to reduced fault tolerance. In this paper, we present a novel path-based method to tackle the fault diagnosis problems in various networks according to the PMC and MM models. Firstly, we introduce the algorithm for partitioning the path into subpaths based on these models. To ensure that at least one subpath is diagnosed as fault-free, we derive the relationship between the fault bound <inline-formula><tex-math>$T$</tex-math></inline-formula> and the path length <inline-formula><tex-math>$N$</tex-math></inline-formula>. Then, building on methods for recognizing the subpath states, we have developed fault diagnosis algorithms for both the PMC and MM models. The simulation results show that our proposed algorithms can quickly and accurately diagnose faults in multiprocessor systems.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 6\",\"pages\":\"1886-1896\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10906474/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10906474/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Path-Based Topology-Agnostic Fault Diagnosis Strategy for Multiprocessor Systems
Fault diagnosis technology is a method for locating faulty processors in multiprocessor systems, and it plays a crucial role in ensuring system stability, security and reliability. A widely used approach in this technology is the system-level strategy, which determines processor status by interpreting the set of test results between adjacent processors. Among them, the PMC and MM models are two commonly employed methods for generating these results. The diversity and complexity of network topologies in systems constrain existing algorithms to specific topologies, while the limitations of fault diagnosis strategies lead to reduced fault tolerance. In this paper, we present a novel path-based method to tackle the fault diagnosis problems in various networks according to the PMC and MM models. Firstly, we introduce the algorithm for partitioning the path into subpaths based on these models. To ensure that at least one subpath is diagnosed as fault-free, we derive the relationship between the fault bound $T$ and the path length $N$. Then, building on methods for recognizing the subpath states, we have developed fault diagnosis algorithms for both the PMC and MM models. The simulation results show that our proposed algorithms can quickly and accurately diagnose faults in multiprocessor systems.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.