高速列车地下异常检测的查询导向多模态学习网络

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-07-28 DOI:10.1016/j.inffus.2025.103530

Wei Liu , Xiaobo Lu , Yun Wei , Zhidan Ran

{"title":"高速列车地下异常检测的查询导向多模态学习网络","authors":"Wei Liu , Xiaobo Lu , Yun Wei , Zhidan Ran","doi":"10.1016/j.inffus.2025.103530","DOIUrl":null,"url":null,"abstract":"<div><div>Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"126 ","pages":"Article 103530"},"PeriodicalIF":15.5000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Query-guided multimodal learning network for anomaly detection underneath high-speed trains\",\"authors\":\"Wei Liu , Xiaobo Lu , Yun Wei , Zhidan Ran\",\"doi\":\"10.1016/j.inffus.2025.103530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"126 \",\"pages\":\"Article 103530\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525006025\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525006025","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

高速列车底部异常检测对列车安全至关重要。然而，异常的复杂性和可变性，以及它们发生的复杂环境，给及时发现带来了重大挑战。为了解决这个问题，我们提出了查询导向的多模态学习网络（QMLNet），它利用多模态信息来发现异常。具体而言，在QMLNet中，CNN-Transformer Feature Fusion （CFF）模块使用查询来指导多层视觉特征的学习和融合，丰富了每一层特征的表达能力，从而提高了异常掩模的预测。基于关注的掩码细化（attention -based Mask Refinement， AMR）模块基于关注机制生成掩码对特征进行增强，通过查询对特征进行学习，得到不同层次特征的全局表示，并将其与文本特征结合使用，更好地预测异常类别。与最先进的方法相比，该方法在我们的异常数据集上取得了更好的结果，领先幅度很大。此外，我们的方法在公共缺陷检测数据集VISION数据集上取得了最好的性能，显著优于大多数方法，证明了我们方法的泛化和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Query-guided multimodal learning network for anomaly detection underneath high-speed trains

Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.