{"title":"Interpretable optical network fault detection and localization with multi-task graph prototype learning","authors":"Xiaokang Chen;Xiaoliang Chen;Zuqing Zhu","doi":"10.1364/JOCN.562633","DOIUrl":null,"url":null,"abstract":"The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves <tex>${\\gt}6.5\\%$</tex> higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"D73-D82"},"PeriodicalIF":4.0000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11085029/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves ${\gt}6.5\%$ higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.
期刊介绍:
The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.