Interpretable optical network fault detection and localization with multi-task graph prototype learning

IF 4 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Xiaokang Chen;Xiaoliang Chen;Zuqing Zhu
{"title":"Interpretable optical network fault detection and localization with multi-task graph prototype learning","authors":"Xiaokang Chen;Xiaoliang Chen;Zuqing Zhu","doi":"10.1364/JOCN.562633","DOIUrl":null,"url":null,"abstract":"The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves <tex>${\\gt}6.5\\%$</tex> higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"D73-D82"},"PeriodicalIF":4.0000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11085029/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves ${\gt}6.5\%$ higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.
基于多任务图原型学习的可解释光网络故障检测与定位
机器学习(ML)的最新进展促进了光网络中数据驱动的自动化故障管理。然而,现有的机器学习辅助故障管理方法主要依赖于缺乏内在可解释性的黑匣子模型,以确保其在关键任务操作场景中的可靠性。在本文中,我们提出了一种利用多任务图原型学习(MT-GPL)的可解释光网络故障检测和定位设计。MT-GPL将光网络和其中收集的光学性能监测数据建模为图结构数据,并利用图神经网络来学习图嵌入,从而捕获拓扑相关性(用于故障定位)和故障判别模式(用于根本原因分析)。MT-GPL通过(i)引入一个原型层来解释其推理,该原型层使用蒙特卡罗树搜索方法学习物理对齐的原型,指示每个故障类别;(ii)基于输入图的嵌入与学习到的原型之间的相似性进行预测。为了增强MT-GPL的可扩展性和可解释性,我们开发了一个多任务架构,该架构通过节点级和设备级原型学习和故障预测来执行并发故障定位和推理。性能评估表明,我们的提议比多层感知器模型的预测精度高6.5 %,而其推理过程的可视化验证了其可解释性的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.40
自引率
16.00%
发文量
104
审稿时长
4 months
期刊介绍: The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信