高速列车地下异常检测的查询导向多模态学习网络

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wei Liu , Xiaobo Lu , Yun Wei , Zhidan Ran
{"title":"高速列车地下异常检测的查询导向多模态学习网络","authors":"Wei Liu ,&nbsp;Xiaobo Lu ,&nbsp;Yun Wei ,&nbsp;Zhidan Ran","doi":"10.1016/j.inffus.2025.103530","DOIUrl":null,"url":null,"abstract":"<div><div>Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"126 ","pages":"Article 103530"},"PeriodicalIF":15.5000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Query-guided multimodal learning network for anomaly detection underneath high-speed trains\",\"authors\":\"Wei Liu ,&nbsp;Xiaobo Lu ,&nbsp;Yun Wei ,&nbsp;Zhidan Ran\",\"doi\":\"10.1016/j.inffus.2025.103530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"126 \",\"pages\":\"Article 103530\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525006025\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525006025","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

高速列车底部异常检测对列车安全至关重要。然而,异常的复杂性和可变性,以及它们发生的复杂环境,给及时发现带来了重大挑战。为了解决这个问题,我们提出了查询导向的多模态学习网络(QMLNet),它利用多模态信息来发现异常。具体而言,在QMLNet中,CNN-Transformer Feature Fusion (CFF)模块使用查询来指导多层视觉特征的学习和融合,丰富了每一层特征的表达能力,从而提高了异常掩模的预测。基于关注的掩码细化(attention -based Mask Refinement, AMR)模块基于关注机制生成掩码对特征进行增强,通过查询对特征进行学习,得到不同层次特征的全局表示,并将其与文本特征结合使用,更好地预测异常类别。与最先进的方法相比,该方法在我们的异常数据集上取得了更好的结果,领先幅度很大。此外,我们的方法在公共缺陷检测数据集VISION数据集上取得了最好的性能,显著优于大多数方法,证明了我们方法的泛化和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Query-guided multimodal learning network for anomaly detection underneath high-speed trains
Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信