Information Fusion最新文献

筛选
英文 中文
VisualRWKV-HM: Enhancing linear visual-language models via hybrid mixing VisualRWKV-HM:通过混合混合增强线性视觉语言模型
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-01 DOI: 10.1016/j.inffus.2025.103336
Haowen Hou , Fei Ma , Zihang Li, Fei Richard Yu
{"title":"VisualRWKV-HM: Enhancing linear visual-language models via hybrid mixing","authors":"Haowen Hou ,&nbsp;Fei Ma ,&nbsp;Zihang Li,&nbsp;Fei Richard Yu","doi":"10.1016/j.inffus.2025.103336","DOIUrl":"10.1016/j.inffus.2025.103336","url":null,"abstract":"<div><div>With the success of Large Language Models, Visual Language Models (VLMs) have also developed rapidly. However, existing VLMs often face limitations due to their quadratic time and space complexity, which poses challenges for training and deployment. Linear VLMs have emerged as a solution, providing linear time and space complexity, along with advantages in training and deployment. Nevertheless, a performance gap remains compared to state-of-the-art (SOTA) VLMs. This paper proposes VisualRWKV-HM, a model with linear complexity that incorporates a hybrid mixing mechanism combining time mixing and cross state mixing. This design achieves an optimal balance in information utilization, enhancing performance and offering flexibility for various tasks. VisualRWKV-HM achieves SOTA performance across single-image, multi-image, and multi-view benchmarks, and significantly outperforms the vanilla VisualRWKV. It demonstrates high computational efficiency with a context length of 24K, being 2.96 times faster and reducing memory usage by 45.38% compared to the Transformer-based LLaVA-1.5. When compared to LongLLaVA, a hybrid model based on the Transformer-Mamba architecture, it consumes less memory and achieves a 24% improvement in throughput at a context length of 16K. Additionally, we show that VisualRWKV-HM has strong scalability, with the potential for improved performance by scaling up the state encoder and decoder.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103336"},"PeriodicalIF":14.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144229604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion-Attention Diagnosis Network (FADNet): An end-to-end framework for optic disc segmentation and ocular disease classification 融合注意诊断网络(FADNet):视盘分割和眼部疾病分类的端到端框架
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-06-01 DOI: 10.1016/j.inffus.2025.103333
Yichen Xiao , Xuan Ding , Shengtao Liu , Yong Ma , Ting Zhang , Ziwei Xiang , Ruyi Zhang , Teruko Fukuyama , Jing Zhao , Yanze Yu , Xuejun Wang , Qinghong Lin , Yu Zhao , Guangyang Tian , Shiping Wen , Zhi Chen , Xingtao Zhou
{"title":"Fusion-Attention Diagnosis Network (FADNet): An end-to-end framework for optic disc segmentation and ocular disease classification","authors":"Yichen Xiao ,&nbsp;Xuan Ding ,&nbsp;Shengtao Liu ,&nbsp;Yong Ma ,&nbsp;Ting Zhang ,&nbsp;Ziwei Xiang ,&nbsp;Ruyi Zhang ,&nbsp;Teruko Fukuyama ,&nbsp;Jing Zhao ,&nbsp;Yanze Yu ,&nbsp;Xuejun Wang ,&nbsp;Qinghong Lin ,&nbsp;Yu Zhao ,&nbsp;Guangyang Tian ,&nbsp;Shiping Wen ,&nbsp;Zhi Chen ,&nbsp;Xingtao Zhou","doi":"10.1016/j.inffus.2025.103333","DOIUrl":"10.1016/j.inffus.2025.103333","url":null,"abstract":"<div><div>Hundreds of millions of people suffer from blindness and severe vision impairment due to pathologic myopia and other ocular illnesses, posing substantial worldwide public health issues. Accurate diagnosis and timely treatment of these conditions heavily rely on the precise segmentation of key anatomical structures in fundus images, such as the optic disc, which is essential for identifying disease types for timely and effective clinical interventions. Although medical image analysis has made significant progress, existing methods often address segmentation and classification as separate tasks, resulting in limited performance and poor clinical applicability. In this work, we present an innovative end-to-end framework named Fusion-Attention Diagnosis Network (FADNet), which unifies ocular disease classification and optic disc segmentation tasks. The core innovation of FADNet lies in the Dynamic Weighted Feature Fusion strategy, which seamlessly integrates the segmentation mask into the original fundus image using a context-aware weighting mechanism. This approach amplifies the contribution of pathological regions, enhancing feature relevance for subsequent classification. The framework first employs an Attention U-Net to achieve accurate optic disc segmentation, followed by a ResNet-based classification network to diagnose ocular diseases from the fused image. Experiments on the iChallenge-PM and Retina datasets indicate that FADNet attains state-of-the-art performance, achieving accuracies of 97.1% in binary classification and 90.4% in multi-class classification, surpassing current methodologies. FADNet outperforms previous methods by its joint optimization strategy, which improves the synergy between segmentation and classification tasks, resulting in notable improvements in diagnostic accuracy and robustness. FADNet showcases its effectiveness and adaptability across multiple datasets, offering a comprehensive and practical solution for the automated diagnosis of ocular diseases, with significant potential for future deployment in clinical settings.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103333"},"PeriodicalIF":14.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144239449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cognitive Disentanglement for Referring Multi-Object Tracking 参考多目标跟踪的认知解纠缠
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-31 DOI: 10.1016/j.inffus.2025.103349
Shaofeng Liang , Runwei Guan , Wangwang Lian , Daizong Liu , Xiaolou Sun , Dongming Wu , Yutao Yue , Weiping Ding , Hui Xiong
{"title":"Cognitive Disentanglement for Referring Multi-Object Tracking","authors":"Shaofeng Liang ,&nbsp;Runwei Guan ,&nbsp;Wangwang Lian ,&nbsp;Daizong Liu ,&nbsp;Xiaolou Sun ,&nbsp;Dongming Wu ,&nbsp;Yutao Yue ,&nbsp;Weiping Ding ,&nbsp;Hui Xiong","doi":"10.1016/j.inffus.2025.103349","DOIUrl":"10.1016/j.inffus.2025.103349","url":null,"abstract":"<div><div>As a significant application of multi-source information fusion in intelligent transportation perception systems, Referring Multi-Object Tracking (RMOT) involves localizing and tracking specific objects in video sequences based on language references. However, existing RMOT approaches often treat language descriptions as holistic embeddings and struggle to effectively integrate the rich semantic information contained in language expressions with visual features. This limitation is especially apparent in complex scenes requiring comprehensive understanding of both static object attributes and spatial motion information. In this paper, we propose a Cognitive Disentanglement for Referring Multi-Object Tracking (CDRMT) framework that addresses these challenges. It adapts the ”what” and ”where” pathways from the human visual processing system to RMOT tasks. Specifically, our framework first establishes cross-modal connections while preserving modality-specific characteristics. It then disentangles language descriptions and hierarchically injects them into object queries, refining object understanding from coarse to fine-grained semantic levels. Finally, we reconstruct language representations based on visual features, ensuring that tracked objects faithfully reflect the referring expression. Extensive experiments on different benchmark datasets demonstrate that CDRMT achieves substantial improvements over state-of-the-art methods, with average gains of 6.0% in HOTA score on Refer-KITTI and 3.2% on Refer-KITTI-V2. Our approach advances the state-of-the-art in RMOT while simultaneously providing new insights into multi-source information fusion.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103349"},"PeriodicalIF":14.7,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144203774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MATADOR: Multimodal traffic accident prediction enhanced by multi-source aggregated emotion recognition 基于多源聚合情感识别的多模式交通事故预测
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-31 DOI: 10.1016/j.inffus.2025.103335
Sainan Zhang , Rui Mao , Jun Zhang , Luwei Xiao , Erik Cambria
{"title":"MATADOR: Multimodal traffic accident prediction enhanced by multi-source aggregated emotion recognition","authors":"Sainan Zhang ,&nbsp;Rui Mao ,&nbsp;Jun Zhang ,&nbsp;Luwei Xiao ,&nbsp;Erik Cambria","doi":"10.1016/j.inffus.2025.103335","DOIUrl":"10.1016/j.inffus.2025.103335","url":null,"abstract":"<div><div>While predicting traffic accidents is challenging, it is highly valuable as it can greatly improve public safety. Previous studies have mostly relied on time-series data capturing drivers’ physiological responses and behavior along with vehicle movement to predict collisions. However, focusing solely on the driver and target vehicle is insufficient, as driving behavior is also influenced by the external environment, especially in emergencies. In this work, we propose a multi-source aggregation model, termed MATADOR. The model aggregates and fuses data from various sources, including multimodal physiological and behavioral indicators of the driver, sensor data from the target vehicle and its surrounding vehicles, and static environmental data such as weather conditions and potential hazards to predict traffic accidents. MATADOR is built upon multi-source feature extraction, multimodal data fusion, and a novel assisting mechanism to detect driver anger, recognize emotion intensity, and predict traffic accident probability over the following 1, 3, and 5 s, respectively. Thus, MATADOR can provide timely alerts to the driver, helping to prevent potential accidents. This proactive approach sets MATADOR apart from previous studies, highlighting its usefulness in real-world applications. Moreover, recognizing the strong link between drivers’ emotional states and accidents, previous studies have utilized multi-task learning to enhance the accuracy of traffic accident prediction. However, they often treat tasks as isolated branches, failing to capture the dependencies between each other. To tackle this challenge, we developed a dynamic assisting mechanism that allows the model to capture the influence of the emotional states of a driver on accident prediction, thereby realizing task-relevance-driven dynamic optimization. Extensive experiments prove that MATADOR significantly outperforms state-of-the-art methods in traffic accident prediction.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103335"},"PeriodicalIF":14.7,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144194618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction 基于金字塔注意力卷积的水下三维场景多模态信息融合
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-30 DOI: 10.1016/j.inffus.2025.103339
Pedro Nuno Leite , Andry Maykol Pinto
{"title":"Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction","authors":"Pedro Nuno Leite ,&nbsp;Andry Maykol Pinto","doi":"10.1016/j.inffus.2025.103339","DOIUrl":"10.1016/j.inffus.2025.103339","url":null,"abstract":"<div><div>Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tri-dimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder–decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124<!--> <!-->m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103339"},"PeriodicalIF":14.7,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144203775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models for automated scholarly paper review: A survey 自动学术论文评审的大型语言模型:综述
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-30 DOI: 10.1016/j.inffus.2025.103332
Zhenzhen Zhuang , Jiandong Chen , Hongfeng Xu , Yuwen Jiang , Jialiang Lin
{"title":"Large language models for automated scholarly paper review: A survey","authors":"Zhenzhen Zhuang ,&nbsp;Jiandong Chen ,&nbsp;Hongfeng Xu ,&nbsp;Yuwen Jiang ,&nbsp;Jialiang Lin","doi":"10.1016/j.inffus.2025.103332","DOIUrl":"10.1016/j.inffus.2025.103332","url":null,"abstract":"<div><div>Large language models (LLMs) have significantly impacted human society, influencing various domains. Among them, academia is not simply a domain affected by LLMs, but it is also the pivotal force in the development of LLMs. In academic publication, this phenomenon is represented during the incorporation of LLMs into the peer review mechanism for reviewing manuscripts. LLMs hold transformative potential for the full-scale implementation of automated scholarly paper review (ASPR), but they also pose new issues and challenges that need to be addressed. In this survey paper, we aim to provide a holistic view of ASPR in the era of LLMs. We begin with a survey to find out which LLMs are used to conduct ASPR. Then, we review what ASPR-related technological bottlenecks have been solved with the incorporation of LLM technology. After that, we move on to explore new methods, new datasets, new source code, and new online systems that come with LLMs for ASPR. Furthermore, we summarize the performance and issues of LLMs in ASPR, and investigate the attitudes and reactions of publishers and academia to ASPR. Lastly, we discuss the challenges and future directions associated with the development of LLMs for ASPR. This survey serves as an inspirational reference for the researchers and can promote the progress of ASPR for its actual implementation.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103332"},"PeriodicalIF":14.7,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
THPFF: A tensor-based high-precision feature fusion model for multi-source data in smart healthcare systems THPFF:智能医疗系统中基于张量的多源数据高精度特征融合模型
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-29 DOI: 10.1016/j.inffus.2025.103324
Songhe Yuan , Laurence T. Yang , Debin Liu , Xiaokang Wang , Jieming Yang
{"title":"THPFF: A tensor-based high-precision feature fusion model for multi-source data in smart healthcare systems","authors":"Songhe Yuan ,&nbsp;Laurence T. Yang ,&nbsp;Debin Liu ,&nbsp;Xiaokang Wang ,&nbsp;Jieming Yang","doi":"10.1016/j.inffus.2025.103324","DOIUrl":"10.1016/j.inffus.2025.103324","url":null,"abstract":"<div><div>Deep learning has revolutionized the field of medical analysis. However, its progress is often constrained by the heterogeneity of multi-sensor data and the lack of a unified predictive architecture. Visual Prompting (VP) emerges as a promising approach, enabling the efficient transfer of fusion knowledge from pre-trained models with lower computational costs. The collision between VP and the medical field may yield unexpected results. This study delves into the potential of VP in the realm of medical image recognition, introducing a novel method names Dual Visual Prompt (DVP), which consisted of Image-Feature Visual Prompting (IF-VP). This approach innovates by fusing input-level and feature-level prompts into a frozen image encoder, thereby boosting its learning efficacy across both CNN and CLIP based VP. For Feature Prompts (FP), we propose an innovative methodology, employing the Adaptive Energy-Weighted Tensor Decomposition (FP-AEWTD) technique to optimize feature extraction processes. Furthermore, we have devised a Border Merging (BM) strategy that fortifies the stability of pre-trained classifiers’ label confidence, specifically under CNN-based VP. IF-VP’s performance was rigorously assessed across 12 distinct medical image recognition tasks, demonstrating its potential to be both a precise and resource-efficient. Especially on the ABIDE dataset, VP-based training exhibited superior performance over Full-finetune, achieving improvements of up to 11.9% on ResNet-18 and 7.7% on ResNeXt-101-32 × 8d. This research paves the way for further explorations into the scalability and adaptability of VP techniques in medical applications, potentially leading to broader implementations and innovations for smart healthcare.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103324"},"PeriodicalIF":14.7,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144239448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty-aware traffic accident risk prediction via multi-view hypergraph contrastive learning 基于多视角超图对比学习的不确定性感知交通事故风险预测
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-29 DOI: 10.1016/j.inffus.2025.103331
Yimei Zhang , Guojiang Shen , Wenyi Zhang , Kaili Ning , Renhe Jiang , Xiangjie Kong
{"title":"Uncertainty-aware traffic accident risk prediction via multi-view hypergraph contrastive learning","authors":"Yimei Zhang ,&nbsp;Guojiang Shen ,&nbsp;Wenyi Zhang ,&nbsp;Kaili Ning ,&nbsp;Renhe Jiang ,&nbsp;Xiangjie Kong","doi":"10.1016/j.inffus.2025.103331","DOIUrl":"10.1016/j.inffus.2025.103331","url":null,"abstract":"<div><div>Traffic accident prediction is crucial for maintaining safety in smart cities. Accurate prediction can significantly reduce casualties and economic losses, while alleviating public concerns about urban safety. However, achieving this is challenging. First, accident data exhibits twofold imbalances: (i) a class imbalance between accident occurrence and non-occurrence, and (ii) a spatial distribution imbalance among different regions. Second, sporadic traffic accidents result in sparse supervised signals, limiting the spatial–temporal representations of conventional deep models. Lastly, the Gaussian assumption underlying the previous deterministic deep learning models is unsuitable for accident risk data characterized by dispersed and many zeros. To address these challenges, we propose an <strong>U</strong>ncertainty-aware spatial–temporal multi-view hypergraph contrastive learning framework for <strong>T</strong>raffic <strong>a</strong>ccident <strong>r</strong>isk prediction (TarU). This framework not only jointly captures local geographical spatial–temporal and global semantic dependencies from different views, but also parameterizes the probabilistic distribution of accident risk to quantify uncertainty. Particularly, a hypergraph-enhanced network and an auxiliary contrastive learning architecture are designed to enhance self-discrimination among regions. Extensive experiments on two real-world datasets demonstrate the effectiveness of TarU. The proposed framework may also be a paradigm for addressing spatial–temporal data mining tasks with sparse labels.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103331"},"PeriodicalIF":14.7,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144194617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformers and large language models for efficient intrusion detection systems: A comprehensive survey 用于高效入侵检测系统的转换器和大型语言模型:综合调查
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-29 DOI: 10.1016/j.inffus.2025.103347
Hamza Kheddar
{"title":"Transformers and large language models for efficient intrusion detection systems: A comprehensive survey","authors":"Hamza Kheddar","doi":"10.1016/j.inffus.2025.103347","DOIUrl":"10.1016/j.inffus.2025.103347","url":null,"abstract":"<div><div>With significant advancements in Transformers and large language models (LLMs), natural language processing (NLP) has extended its reach into many research fields due to its enhanced capabilities in text generation and user interaction. One field benefiting greatly from these advancements is cybersecurity. In cybersecurity, many parameters that need to be protected and exchanged between senders and receivers are in the form of text and tabular data, making NLP a valuable tool in enhancing the security measures of communication protocols. This survey paper provides a comprehensive analysis of the utilization of Transformers and LLMs in cyber-threat detection systems. The methodology of paper selection and bibliometric analysis is outlined to establish a rigorous framework for evaluating existing research. The fundamentals of Transformers are discussed, including background information on various cyber-attacks and datasets commonly used in this field. The survey explores the application of Transformers in intrusion detection systems (IDSs), focusing on different architectures such as Attention-based models, LLMs like BERT and GPT, CNN/LSTM-Transformer hybrids, and emerging approaches like Vision Transformers (ViTs), and more. Furthermore, it explores the diverse environments and applications where Transformers and LLMs-based IDS have been implemented, including computer networks, Internet of things (IoT) devices, critical infrastructure protection, cloud computing, software-defined networking (SDN), as well as in autonomous vehicles (AVs). The paper also addresses research challenges and future directions in this area, identifying key issues such as interpretability, scalability, and adaptability to evolving threats, and more. Finally, the conclusion summarizes the findings and highlights the significance of Transformers and LLMs in enhancing cyber-threat detection capabilities, while also outlining potential avenues for further research and development.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103347"},"PeriodicalIF":14.7,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144194619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interpretable integration fusion time-frequency prototype contrastive learning for machine fault diagnosis with limited labeled samples 基于可解释积分融合时频原型对比学习的有限标记样本机器故障诊断
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-05-29 DOI: 10.1016/j.inffus.2025.103340
Yutong Dong, Hongkai Jiang, Xin Wang, Mingzhe Mu
{"title":"An interpretable integration fusion time-frequency prototype contrastive learning for machine fault diagnosis with limited labeled samples","authors":"Yutong Dong,&nbsp;Hongkai Jiang,&nbsp;Xin Wang,&nbsp;Mingzhe Mu","doi":"10.1016/j.inffus.2025.103340","DOIUrl":"10.1016/j.inffus.2025.103340","url":null,"abstract":"<div><div>The rise of Industry 4.0 and Industry 5.0, focusing on digital transformation and human-machine collaboration, has boosted the need for advanced fault diagnosis technologies. These must be interpretable to ensure industrial efficiency, reliability, and safety. However, current methods often rely on single-sensor information, require many labeled samples for training, and struggle to justify diagnostic decisions. These limitations reduce their effectiveness in real-world production environments. Aiming at these problems, this paper proposed an interpretable integration fusion time-frequency prototype contrastive learning (IIF-TFPCL) for machine fault diagnosis with limited labeled samples. First, a data-level fusion method based on integrated Gini coefficient entropy is designed to achieve credible fusion of multi-sensor signals while enhancing the fault characteristics of the fused signals. Second, an interpretable wavelet feature fusion convolutional transformer architecture is constructed to achieve interpretable fault extraction from faulty signals. Then, a dual dynamic pseudo-labeling selection strategy is devised to efficiently choose high-confidence unlabeled samples from the original imbalanced unlabeled data. In this process, a self-attention mechanism is employed to measure the correlation between unlabeled samples and initial prototypes. Finally, a time-frequency prototype contrastive loss is constructed to enhance the discriminative ability and robustness of the network in fault diagnosis tasks. The IIF-TFPCL was validated using fused multi-sensor signals and various original single-sensor signals. The experiments display that it is significantly superior to the remaining seven comparison methods. The experimental analysis demonstrates the excellent fault identification performance and interpretability of the IIF-TFPCL with limited labeled data.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103340"},"PeriodicalIF":14.7,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144166605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信