Interpretable optical network fault detection and localization with multi-task graph prototype learning

IF 4 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Optical Communications and Networking Pub Date : 2025-07-18 DOI:10.1364/JOCN.562633

Xiaokang Chen;Xiaoliang Chen;Zuqing Zhu

{"title":"Interpretable optical network fault detection and localization with multi-task graph prototype learning","authors":"Xiaokang Chen;Xiaoliang Chen;Zuqing Zhu","doi":"10.1364/JOCN.562633","DOIUrl":null,"url":null,"abstract":"The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves <tex>${\\gt}6.5\\%$</tex> higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"D73-D82"},"PeriodicalIF":4.0000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11085029/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves ${\gt}6.5\%$ higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.

查看原文本刊更多论文

基于多任务图原型学习的可解释光网络故障检测与定位

机器学习（ML）的最新进展促进了光网络中数据驱动的自动化故障管理。然而，现有的机器学习辅助故障管理方法主要依赖于缺乏内在可解释性的黑匣子模型，以确保其在关键任务操作场景中的可靠性。在本文中，我们提出了一种利用多任务图原型学习（MT-GPL）的可解释光网络故障检测和定位设计。MT-GPL将光网络和其中收集的光学性能监测数据建模为图结构数据，并利用图神经网络来学习图嵌入，从而捕获拓扑相关性（用于故障定位）和故障判别模式（用于根本原因分析）。MT-GPL通过(i)引入一个原型层来解释其推理，该原型层使用蒙特卡罗树搜索方法学习物理对齐的原型，指示每个故障类别；（ii）基于输入图的嵌入与学习到的原型之间的相似性进行预测。为了增强MT-GPL的可扩展性和可解释性，我们开发了一个多任务架构，该架构通过节点级和设备级原型学习和故障预测来执行并发故障定位和推理。性能评估表明，我们的提议比多层感知器模型的预测精度高6.5 %，而其推理过程的可视化验证了其可解释性的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Optical Communications and Networking 工程技术-电信学

CiteScore

9.40

自引率

16.00%

发文量

104

审稿时长

4 months

期刊介绍： The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.