Information Fusion最新文献

筛选
英文 中文
Perceptive spectral transformer unfolding network with multiscale mixed training for arbitrary-scale hyperspectral and multispectral image fusion
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-11 DOI: 10.1016/j.inffus.2025.103166
Bin Wang , Xingchuang Xiong , Yusheng Lian , Xuheng Cao , Han Zhou , Kun Yu , Zilong Liu
{"title":"Perceptive spectral transformer unfolding network with multiscale mixed training for arbitrary-scale hyperspectral and multispectral image fusion","authors":"Bin Wang ,&nbsp;Xingchuang Xiong ,&nbsp;Yusheng Lian ,&nbsp;Xuheng Cao ,&nbsp;Han Zhou ,&nbsp;Kun Yu ,&nbsp;Zilong Liu","doi":"10.1016/j.inffus.2025.103166","DOIUrl":"10.1016/j.inffus.2025.103166","url":null,"abstract":"<div><div>Hyperspectral imaging offers rich spectral information but often suffers from a tradeoff between spatial and spectral resolutions owing to hardware limitations. To address this, hyperspectral image (HSI)-multispectral image (MSI) fusion techniques have emerged, which combines low-resolution HSI (LR-HSI) with high-resolution MSI (HR-MSI) to generate HR-HSI. However, existing methods often struggle with generalization across varying image resolutions and lack interpretability due to reliance on deep learning models without physical degradation constraints. This study introduces two major innovations to overcome these challenges: (1) a resolution-independent unfolding algorithm and a multiscale training framework, which allows flexible adaptation to LR-HSI of any resolution without increasing model complexity, thereby enhancing generalization in dynamic remote sensing environments and (2) a novel degradation design using real LR-HSI and HR-MSI as priors to guide spatial-spectral degradation and, in the external framework, implementing degradation constraints, thereby ensuring accurate approximation of true degradation processes. In addition, this study proposes a perceptive spectral transformer with perceptive spectral attention in a U-Net architecture to adaptively transfer spectral information, improving fusion accuracy. Experimental results highlight advantages of our approach under both single-scale training and multiscale mixed training conditions. Compared with eight state-of-the-art fusion algorithms, our approach demonstrates exceptional performance under single-scale training conditions. More importantly, through multiscale mixed training, its performance is further enhanced, achieving super-resolution magnifications ranging from 4 × to 128 ×, validating the effectiveness of the framework. Experiments demonstrated the robustness and practical applicability of the proposed approach in simulated and real-world scenarios; the code is available at <span><span>https://github.com/XWangBin/PSTUN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103166"},"PeriodicalIF":14.7,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TMF-Net: Multimodal smart contract vulnerability detection based on multiscale transformer fusion
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-11 DOI: 10.1016/j.inffus.2025.103189
Tengfei Wang, Xiangfu Zhao, Jiarui Zhang
{"title":"TMF-Net: Multimodal smart contract vulnerability detection based on multiscale transformer fusion","authors":"Tengfei Wang,&nbsp;Xiangfu Zhao,&nbsp;Jiarui Zhang","doi":"10.1016/j.inffus.2025.103189","DOIUrl":"10.1016/j.inffus.2025.103189","url":null,"abstract":"<div><div>Smart contracts represent a crucial element of the blockchain ecosystem, offering advantages like automation and decentralization. Nevertheless, the occurrence of frequent potential security breaches in smart contracts has led to considerable economic losses and substantial security risks. Although recent approaches have increasingly adopted multimodal strategies, substantial challenges remain in comprehensively extracting information from smart contracts and integrating data across modalities. This hinders capturing both the intricate semantics of smart contracts and the complex interactions across modalities. In this work, we propose <strong>TMF-Net</strong>, a novel multimodal method, using a <strong>M</strong>uti-scale <strong>T</strong>ransformer <strong>F</strong>usion <strong>Net</strong>work to finely detect smart contract vulnerabilities. Our work aims to enhance accuracy by fusing multimodal data and extracting multiscale semantic features. In contrast to conventional techniques, TMF-Net employs an innovative approach that combines three distinct modal information sources: contract graphs, bytecode text, and code images. These inputs are then subjected to a deep feature extraction process, utilizing GCN, Bi-LSTM, and CNN models, respectively. Furthermore, TMF-Net introduces the multiscale transformer fusion module, which facilitates deep multiscale fusion of multimodal information through the multi-head attention mechanism and multi-level encoder structure. This enables the learning of a more comprehensive and fine-grained semantic representation of the code. Experimental results on the validated dataset demonstrate that TMF-Net achieves notable performance gains in the detection of multiple vulnerability types. To illustrate, in the context of reentrancy exploits, the F1-score exhibits a 4.69% improvement over the state-of-the-art methods, which substantiates the efficacy of the multimodal learning and multiscale fusion strategies.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103189"},"PeriodicalIF":14.7,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAENet: Boost image-guided point cloud completion more accurate and even
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-09 DOI: 10.1016/j.inffus.2025.103179
Moyun Liu , Ziheng Yang , Bing Chen , Youping Chen , Jingming Xie , Lei Yao , Lap-Pui Chau , Jiawei Du , Joey Tianyi Zhou
{"title":"MAENet: Boost image-guided point cloud completion more accurate and even","authors":"Moyun Liu ,&nbsp;Ziheng Yang ,&nbsp;Bing Chen ,&nbsp;Youping Chen ,&nbsp;Jingming Xie ,&nbsp;Lei Yao ,&nbsp;Lap-Pui Chau ,&nbsp;Jiawei Du ,&nbsp;Joey Tianyi Zhou","doi":"10.1016/j.inffus.2025.103179","DOIUrl":"10.1016/j.inffus.2025.103179","url":null,"abstract":"<div><div>Point clouds offer a powerful representation for capturing real-world 3D structures, making them indispensable in applications such as robotics, autonomous driving, and embodied AI. However, due to sensor limitations and occlusions, the captured point clouds are often incomplete, posing significant challenges for accurate 3D perception. Leveraging color images to guide point cloud completion has shown promise. In this paper, we propose a novel image-guided framework to advance this domain. Our approach generates two coarse point clouds: one reconstructed from the image and the other completed from the partial input. Unlike conventional unidirectional optimization methods that use images solely to enhance point clouds, we introduce a bi-directional pyramid optimization that fuses holistic shape cues from image-based reconstruction and stereo-rich details from point-based completion. This combination of complementary features significantly enhances completion accuracy. More importantly, we analyze the most advanced decoder used in recent image-guided point cloud completion methods and identify a key limitation: uneven point density. Uneven density can degrade geometric fidelity, lead to over-sampling in certain regions, and cause under-representation in others. To address this, we propose a novel propagation and density refinement process that balances point density, further refining the output. Extensive experiments show that our proposed More Accurate and Even Network (MAENet) not only surpasses all state-of-the-art methods in terms of accuracy, but also generates more uniform point clouds. Furthermore, MAENet shows strong generalization performance on unseen categories. The code will be made available at <span><span>https://github.com/lmomoy/MAENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103179"},"PeriodicalIF":14.7,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LATTE: A Real-time Lightweight Attention-based Traffic Accident Anticipation Engine
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-09 DOI: 10.1016/j.inffus.2025.103173
Jiaxun Zhang , Yanchen Guan , Chengyue Wang , Haicheng Liao , Guohui Zhang , Zhenning Li
{"title":"LATTE: A Real-time Lightweight Attention-based Traffic Accident Anticipation Engine","authors":"Jiaxun Zhang ,&nbsp;Yanchen Guan ,&nbsp;Chengyue Wang ,&nbsp;Haicheng Liao ,&nbsp;Guohui Zhang ,&nbsp;Zhenning Li","doi":"10.1016/j.inffus.2025.103173","DOIUrl":"10.1016/j.inffus.2025.103173","url":null,"abstract":"<div><div>Accurately predicting traffic accidents in real-time is a critical challenge in autonomous driving, particularly in resource-constrained environments. Existing solutions often suffer from high computational overhead or fail to adequately address the uncertainty of evolving traffic scenarios. This paper introduces LATTE, a Lightweight Attention-based Traffic Accident Anticipation Engine, which integrates computational efficiency with state-of-the-art performance. LATTE employs Efficient Multiscale Spatial Aggregation (EMSA) to capture spatial features across scales, Memory Attention Aggregation (MAA) to enhance temporal modeling, and Auxiliary Self-Attention Aggregation (AAA) to extract latent dependencies over extended sequences. Additionally, LATTE incorporates the Flamingo Alert-Assisted System (FAA), leveraging a vision–language model to provide real-time, cognitively accessible verbal hazard alerts, improving passenger situational awareness. Evaluations on benchmark datasets (DAD, CCD, A3D) demonstrate LATTE’s superior predictive capabilities and computational efficiency. LATTE achieves state-of-the-art 89.74% Average Precision (AP) on DAD benchmark, with 5.4% higher mean Time-To-Accident (mTTA) than the second-best model, and maintains competitive mTTA at a Recall of 80% (TTA@R80) (4.04s) while demonstrating robust accident anticipation across diverse driving conditions. Its lightweight design delivers a 93.14% reduction in floating-point operations (FLOPs) and a 31.58% decrease in parameter count (Params), enabling real-time operation on resource-limited hardware without compromising performance. Ablation studies confirm the effectiveness of LATTE’s architectural components, while visualizations and failure case analyses highlight its practical applicability and areas for enhancement. Our codes are available at <span><span>https://github.com/icypear/LATTE.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103173"},"PeriodicalIF":14.7,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143807655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language-guided reasoning segmentation for underwater images
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-09 DOI: 10.1016/j.inffus.2025.103177
Mingde Yao , King Man Tam , Menglu Wang , Lingen Li , Rei Kawakami
{"title":"Language-guided reasoning segmentation for underwater images","authors":"Mingde Yao ,&nbsp;King Man Tam ,&nbsp;Menglu Wang ,&nbsp;Lingen Li ,&nbsp;Rei Kawakami","doi":"10.1016/j.inffus.2025.103177","DOIUrl":"10.1016/j.inffus.2025.103177","url":null,"abstract":"<div><div>In this paper, we introduce Language-Guided Reasoning Segmentation (LGRS), a framework that leverages human language instructions to guide underwater image segmentation. Unlike existing methods, that rely solely on visual cues or predefined categories, LGRS enables segmentation at underwater images based on detailed, context-aware textual descriptions, allowing it to tackle more challenging scenarios, such as distinguishing visually similar objects or identifying species from complex queries. To facilitate the development and evaluation of this approach, we create an underwater image-language segmentation dataset, the first of its kind, which pairs underwater images with detailed textual descriptions and corresponding segmentation masks. This dataset provides a foundation for training models capable of processing both visual and linguistic inputs simultaneously. Furthermore, LGRS incorporates reasoning capabilities through large language models, enabling the system to interpret complex relationships between objects in the scene and perform accurate segmentation in dynamic underwater environments. Notably, our method also demonstrates strong zero-shot segmentation capabilities, enabling the model to generalize to unseen categories without additional training. Experimental results show that LGRS outperforms existing underwater image segmentation methods in both accuracy and flexibility, offering a foundation for further advancements.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103177"},"PeriodicalIF":14.7,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MGF-GCN: Multimodal interaction Mamba-aided graph convolutional fusion network for semantic segmentation of remote sensing images
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-08 DOI: 10.1016/j.inffus.2025.103150
Yanfeng Zhao , Linwei Qiu , Zhenjian Yang , Yadong Chen , Yunjie Zhang
{"title":"MGF-GCN: Multimodal interaction Mamba-aided graph convolutional fusion network for semantic segmentation of remote sensing images","authors":"Yanfeng Zhao ,&nbsp;Linwei Qiu ,&nbsp;Zhenjian Yang ,&nbsp;Yadong Chen ,&nbsp;Yunjie Zhang","doi":"10.1016/j.inffus.2025.103150","DOIUrl":"10.1016/j.inffus.2025.103150","url":null,"abstract":"<div><div>With the continuous improvement of the spatial resolution of remote sensing images, the application of semantic segmentation to high-resolution remote sensing data has made significant progress. However, due to the limited information provided by unimodal data and the similarity or confusion between different types of objects in complex scenes, the performance of unimodal segmentation models remains limited. To address these challenges, this paper proposes a multimodal semantic segmentation network that combines visible light data (RGB/IRRG) with digital surface model (DSM) data to provide richer object information. We employ different backbone networks to process visible light and DSM data separately and design a Height-Aware Graph Convolution (HAGC) strategy to effectively capture the spatial correlations between the two modalities. Additionally, we introduce a Multimodal Hierarchical Interaction Mamba (MHIMamba) module to fuse and process features from different modalities, achieving feature complementarity and enhancing segmentation performance. Finally, we apply a Progressive Context Cascade Decoder (PCCD) to recover spatial details. This work pioneers the integration of the Mamba into multimodal semantic segmentation, facilitating effective cross-modal feature interaction and improving segmentation accuracy in remote sensing imagery. Experimental results demonstrate that our model achieves state-of-the-art segmentation performance on the Potsdam and Vaihingen datasets. Our code is available at <span><span>https://github.com/zyf-cell/MGF-GCN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103150"},"PeriodicalIF":14.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Magnetic source imaging registration based on self-supervised learning and multi-view differentiable rendering
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-08 DOI: 10.1016/j.inffus.2025.103161
Xingwen Fu , Yuqing Yang , Yidi Cao , Qiuyu Han , Xuanbo Guo , Yu Xu , Xiaolin Ning
{"title":"Magnetic source imaging registration based on self-supervised learning and multi-view differentiable rendering","authors":"Xingwen Fu ,&nbsp;Yuqing Yang ,&nbsp;Yidi Cao ,&nbsp;Qiuyu Han ,&nbsp;Xuanbo Guo ,&nbsp;Yu Xu ,&nbsp;Xiaolin Ning","doi":"10.1016/j.inffus.2025.103161","DOIUrl":"10.1016/j.inffus.2025.103161","url":null,"abstract":"<div><div>High-sensitivity miniature optically pumped magnetometers (OPMs) and magnetoencephalography (MEG) technologies enable precise localization of brain magnetic field sources and their activity. Magnetic source imaging (MSI) relies on the accurate registration of MEG with Magnetic Resonance Imaging (MRI) to achieve high localization precision. Current registration algorithms typically depend on head point clouds reconstructed by optical scanners, which are then used to compute the coordinate transformation matrix between MEG and MRI. However, the complex process of reconstructing head point clouds not only increases the operational difficulty and the complexity of automation but also results in an unavoidable preparation time of 2–3 min, which is not user-friendly for non-expert operators or emergency patients. To address this, we propose a new MSI registration method based on self-supervised learning and multi-view differentiable rendering. This method eliminates the cumbersome head point cloud reconstruction process and instead achieves registration through several images taken from different angles. We treat the transformation matrix as an optimizable model parameter, and by comparing the differences between the captured images and the rendered images, we use differentiable rendering techniques to propagate gradients and optimize the transformation matrix, gradually bringing it closer to the true value. Experimental results show that this method reduces the preparation time to 30 s, significantly simplifies the operation, and achieves registration accuracy surpassing that of structured light scanners, approaching the precision of laser scanners. Furthermore, the required equipment cost is only 56% of that of a structured light scanner and 17% of a laser scanner. This advantage facilitates the widespread application of wearable OPM-MEG systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103161"},"PeriodicalIF":14.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PEARL: A dual-layer graph learning for multimodal recommendation
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-08 DOI: 10.1016/j.inffus.2025.103168
Yuzhuo Dang , Wanyu Chen , Zhiqiang Pan , Yuxiao Duan , Fei Cai , Honghui Chen
{"title":"PEARL: A dual-layer graph learning for multimodal recommendation","authors":"Yuzhuo Dang ,&nbsp;Wanyu Chen ,&nbsp;Zhiqiang Pan ,&nbsp;Yuxiao Duan ,&nbsp;Fei Cai ,&nbsp;Honghui Chen","doi":"10.1016/j.inffus.2025.103168","DOIUrl":"10.1016/j.inffus.2025.103168","url":null,"abstract":"<div><div>Multimodal recommendation has increasingly become a mainstream information service technology, which transcends traditional user–item interaction-based recommender systems by integrating the multimodal features of items. Although existing works have made notable progress by focusing on user–item interaction graph structures and self-supervised learning to enhance multimodal representation learning, they still exhibit the following two limitations: (1) Performing graph convolution operations on a fixed interaction graph introduces misleading noisy signals caused by user’s imbalanced attention on various modalities. (2) Lacking exploration of inherent self-supervised signals in multimodal attributes fails to mitigate the distribution bias introduced during data augmentation. To address these issues, we propose a novel method named <u><strong>P</strong></u>urified-int<u><strong>E</strong></u>raction and <u><strong>A</strong></u>ffinity g<u><strong>R</strong></u>aph <u><strong>L</strong></u>earning (<strong>PEARL</strong>) for multimodal recommendation, which utilizes a dual-layer graph learning to model the user preference. Specifically, to eliminate misleading noisy signals, we designed a graph purification strategy that constructs purified modality-specific interaction graphs, thereby removing noisy edges from the raw interaction graph. Then, the user and item affinity graphs are constructed based on the user co-occurrences and the item multimodal features, respectively, which are then utilized for message passing to mine the implicit self-supervised signals between similar users or items. After that, we propose an augmentation-free contrastive learning task in the fusion module to improve the quality of ID embeddings and multimodal features, ultimately generating the final representations of users and items. Comprehensive experimental results on three publicly available datasets verify the superiority of PEARL against diverse state-of-the-art baselines, where the improvements are especially noticeable in large-scale datasets. Our experimental code and data utilized in this work can be accessed at <span><span>https://github.com/Yuzhuo-Dang/PEARL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103168"},"PeriodicalIF":14.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pre-training Enhanced Transformer for multivariate time series anomaly detection
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-08 DOI: 10.1016/j.inffus.2025.103171
Chao Wang , Haochen Shi , Jie Hu , Xin Yang , Junbo Zhang , Shengdong Du , Tianrui Li
{"title":"Pre-training Enhanced Transformer for multivariate time series anomaly detection","authors":"Chao Wang ,&nbsp;Haochen Shi ,&nbsp;Jie Hu ,&nbsp;Xin Yang ,&nbsp;Junbo Zhang ,&nbsp;Shengdong Du ,&nbsp;Tianrui Li","doi":"10.1016/j.inffus.2025.103171","DOIUrl":"10.1016/j.inffus.2025.103171","url":null,"abstract":"<div><div>In a multitude of data-driven industries, the timely detection of anomalies and the assurance of service reliability have become of paramount importance. Recently, Transformer has emerged as a prominent approach in the field of multivariate time series anomaly detection, largely due to its exceptional capacity to model global dependencies. The capture of long-term dependencies in data by Transformer substantially improves the precision of anomaly detection. However, given the infrequency of anomalies or anomalous events in time series, local anomaly information has a limited impact on global temporal patterns. Furthermore, in comparison to Convolutional Neural Networks (CNNs), Transformer is relatively weaker in extracting fine-grained local feature patterns, which results in a lack of local contextual information in the global modeling of time series. To solve this problem, we present a distinctive unsupervised framework in which <u>T</u>ransformer is <u>E</u>nhanced by a scalable <u>P</u>re-training module (TEP). In particular, we have devised a pre-training module that is able to effectively extract feature patterns from time series, thereby yielding segment-level representations. These representations furnish useful contextual information to Transformer, thereby enhancing its reconstruction capability for time series. Furthermore, we integrate a form of universal anomaly feature knowledge into the model, thus amplifying the discrepancy between normal and anomalous points, which helps the model capture more effective anomalies. The results of experimental evaluations on three publicly realistic datasets demonstrate that our framework markedly enhances the reconstruction capability of the Transformer for time series and improves the model’s anomaly detection performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103171"},"PeriodicalIF":14.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143807134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence-enabled detection and assessment of Parkinson’s disease using multimodal data: A survey
IF 14.7 1区 计算机科学
Information Fusion Pub Date : 2025-04-07 DOI: 10.1016/j.inffus.2025.103175
Aite Zhao , Yongcan Liu , Xinglin Yu , Xinyue Xing , Huiyu Zhou
{"title":"Artificial intelligence-enabled detection and assessment of Parkinson’s disease using multimodal data: A survey","authors":"Aite Zhao ,&nbsp;Yongcan Liu ,&nbsp;Xinglin Yu ,&nbsp;Xinyue Xing ,&nbsp;Huiyu Zhou","doi":"10.1016/j.inffus.2025.103175","DOIUrl":"10.1016/j.inffus.2025.103175","url":null,"abstract":"<div><div>Highly adaptable and reusable AI models are revolutionizing the diagnosis and management of Parkinson’s disease (PD). A wide range of AI algorithms, including machine learning and deep learning techniques, are now being employed for PD diagnosis and treatment. These algorithms leverage multimodal data, such as gait patterns, hand movements, and speech characteristics, to predict the likelihood of PD, evaluate symptom severity, facilitate early detection, and assess the effectiveness of treatments, demonstrating their advanced diagnostic potential. This paper provides a comprehensive review of machine learning and deep learning approaches for PD detection and assessment over the past decade, emphasizing their strengths, addressing their limitations, and exploring their potential to inspire new research directions. Additionally, it offers a curated collection of publicly available multimodal datasets focused on PD motor symptoms and validates the performance of these algorithms on privately collected datasets. Experimental results reveal that AI technologies for PD assessment have progressed from traditional methods to convolutional and sequential neural networks, and further to Transformer-based models, achieving consistently improving accuracy and establishing benchmarks for future advancements.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103175"},"PeriodicalIF":14.7,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143790900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信