{"title":"高速列车地下异常检测的查询导向多模态学习网络","authors":"Wei Liu , Xiaobo Lu , Yun Wei , Zhidan Ran","doi":"10.1016/j.inffus.2025.103530","DOIUrl":null,"url":null,"abstract":"<div><div>Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"126 ","pages":"Article 103530"},"PeriodicalIF":15.5000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Query-guided multimodal learning network for anomaly detection underneath high-speed trains\",\"authors\":\"Wei Liu , Xiaobo Lu , Yun Wei , Zhidan Ran\",\"doi\":\"10.1016/j.inffus.2025.103530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"126 \",\"pages\":\"Article 103530\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525006025\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525006025","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Query-guided multimodal learning network for anomaly detection underneath high-speed trains
Anomaly detection on the bottom of high-speed trains is crucial for train safety. However, the complexity and variability of anomalies, along with the intricate environment in which they occur, pose significant challenges to timely detection. To address this issue, we propose the Query-guided Multimodal Learning Network (QMLNet) that exploits multimodal information to discover anomalies. Specifically, in QMLNet, the CNN-Transformer Feature Fusion (CFF) Module uses queries to guide the learning and fusion of multi-level visual features, enriching the expressiveness of features at each level for improved prediction of anomaly masks. The Attention-based Mask Refinement (AMR) module generates masks based on the attention mechanism to enhance the features, learns the features using queries, and obtains a global representation of the features at different levels, which will be used along with the textual features for better prediction of the anomaly categories. Compared to state-of-the-art methods, the proposed method achieves superior results on our anomaly dataset, leading by a significant margin. In addition, our method achieves the best performance on the public defect detection dataset, the VISION dataset, significantly outperforming most methods, which proves the generalization and robustness of our approach.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.