Image and Vision Computing最新文献

筛选
英文 中文
Early progression detection from MCI to AD using multi-view MRI for enhanced assisted living 使用多视点MRI进行MCI到AD的早期进展检测,以增强辅助生活
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-08 DOI: 10.1016/j.imavis.2025.105491
Nasir Rahim , Naveed Ahmad , Waseem Ullah , Jatin Bedi , Younhyun Jung
{"title":"Early progression detection from MCI to AD using multi-view MRI for enhanced assisted living","authors":"Nasir Rahim ,&nbsp;Naveed Ahmad ,&nbsp;Waseem Ullah ,&nbsp;Jatin Bedi ,&nbsp;Younhyun Jung","doi":"10.1016/j.imavis.2025.105491","DOIUrl":"10.1016/j.imavis.2025.105491","url":null,"abstract":"<div><div>Alzheimer's disease (AD) is a progressive neurodegenerative disorder. Early detection is crucial for timely intervention and treatment to improve assisted living. Although magnetic resonance imaging (MRI) is a widely used neuroimaging modality for the diagnosis of AD, most studies focus on a single MRI plane, missing comprehensive spatial information. In this study, we proposed a novel approach that leverages multiple MRI planes (axial, coronal, and sagittal) from 3D MRI volumes to predict progression from stable mild cognitive impairment (sMCI) to progressive MCI (pMCI) and AD. We employed a list of convolutional neural networks, including EfficientNet-B7, ConvNext, and DenseNet-121, to extract deep features from each MRI plane, followed by a feature enhancement step through an attention module. The optimized feature set was then passed through a Bayesian-optimized pool of classification heads (i.e., multilayer perceptron (MLP), long short-term memory (LSTM), and multi-head attention (MHA)) to obtain the most effective model for each MRI plane. The optimal model for each MRI plane was then integrated into homogeneous and heterogeneous ensembles to further enhance the performance of the model. Using the ADNI dataset, the proposed model achieved 91% accuracy, 87% sensitivity, 88% specificity, and 92% AUC. To enhance the interpretability of the model, we used the Grad-CAM explainability technique to generate attention maps for each MRI plane, which identified critical brain regions affected by disease progression. These attention maps revealed consistent patterns of tissue damage across the MRI scans. The results demonstrate the effectiveness of combining multiplane MRI data with ensemble learning and attention mechanisms to improve the early detection and tracking of AD progression in patients with MCI, offering a more comprehensive diagnostic tool and enhanced clinical decision-making. The datasets, results, and code used to conduct the comprehensive analysis are made available to the research community through the following link: <span><span><em>https://github.com/nasir3843/Early_Progression_detection_MCI-to_AD</em></span><svg><path></path></svg></span></div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105491"},"PeriodicalIF":4.2,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An edge-aware high-resolution framework for camouflaged object detection 一种用于伪装目标检测的边缘感知高分辨率框架
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-07 DOI: 10.1016/j.imavis.2025.105487
Jingyuan Ma , Tianyou Chen , Jin Xiao , Xiaoguang Hu , Yingxun Wang
{"title":"An edge-aware high-resolution framework for camouflaged object detection","authors":"Jingyuan Ma ,&nbsp;Tianyou Chen ,&nbsp;Jin Xiao ,&nbsp;Xiaoguang Hu ,&nbsp;Yingxun Wang","doi":"10.1016/j.imavis.2025.105487","DOIUrl":"10.1016/j.imavis.2025.105487","url":null,"abstract":"<div><div>Camouflaged objects are often seamlessly assimilated into their surroundings and exhibit indistinct boundaries. The complex environmental conditions and the high intrinsic similarity between camouflaged targets and their backgrounds present significant challenges in accurately locating and fully segmenting these objects. Although existing methods have achieved remarkable performance across various real-world scenarios, they still struggle with challenging cases such as small targets, thin structures, and blurred boundaries. To address these issues, we propose a novel edge-aware high-resolution network. Specifically, we design a High-Resolution Feature Enhancement Module to exploit multi-scale features while preserving local details. Furthermore, we introduce an Edge Prediction Module to generate high-quality edge prediction maps. Subsequently, we develop an Attention-Guided Fusion Module to effectively leverage the edge prediction maps. With these key modules, the proposed model achieves real-time performance at 58 FPS and surpasses 21 state-of-the-art algorithms across six standard evaluation metrics. Source code will be publicly available at <span><span>https://github.com/clelouch/EHNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105487"},"PeriodicalIF":4.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MUNet: A lightweight Mamba-based Under-Display Camera restoration network MUNet:一个基于曼巴的轻型显示器下相机恢复网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-06 DOI: 10.1016/j.imavis.2025.105486
Wenxin Wang , Boyun Li , Wanli Liu , Xi Peng , Yuanbiao Gou
{"title":"MUNet: A lightweight Mamba-based Under-Display Camera restoration network","authors":"Wenxin Wang ,&nbsp;Boyun Li ,&nbsp;Wanli Liu ,&nbsp;Xi Peng ,&nbsp;Yuanbiao Gou","doi":"10.1016/j.imavis.2025.105486","DOIUrl":"10.1016/j.imavis.2025.105486","url":null,"abstract":"<div><div>Under-Display Camera (UDC) restoration aims to recover the underlying clean images from the degraded images captured by UDC. Although promising results have been achieved, most existing UDC restoration methods still suffer from two vital obstacles in practice: (i) existing UDC restoration models are parameter-intensive, and (ii) most of them struggle to capture long-range dependencies within high-resolution images. To overcome above drawbacks, we study a challenging problem in UDC restoration, namely, how to design a lightweight UDC restoration model that could capture long-range image dependencies. To this end, we propose a novel lightweight Mamba-based UDC restoration network (MUNet) consisting of two modules, named Separate Multi-scale Mamba (SMM) and Separate Convolutional Feature Extractor (SCFE). Specifically, SMM exploits our proposed alternate scanning strategy to efficiently capture long-range dependencies across multi-scale image features. SCFE preserves local dependencies through convolutions with various receptive fields. Thanks to SMM and SCFE, MUNet achieves state-of-the-art lightweight UDC restoration performance with significantly fewer parameters, making it well-suited for deployment on mobile devices. Our codes will be available after acceptance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105486"},"PeriodicalIF":4.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive scale matching for remote sensing object detection based on aerial images 基于航拍图像的遥感目标检测自适应尺度匹配
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-06 DOI: 10.1016/j.imavis.2025.105482
Lu Han , Nan Li , Zeyuan Zhong , Dong Niu , Bingbing Gao
{"title":"Adaptive scale matching for remote sensing object detection based on aerial images","authors":"Lu Han ,&nbsp;Nan Li ,&nbsp;Zeyuan Zhong ,&nbsp;Dong Niu ,&nbsp;Bingbing Gao","doi":"10.1016/j.imavis.2025.105482","DOIUrl":"10.1016/j.imavis.2025.105482","url":null,"abstract":"<div><div>Remote sensing object detection based on aerial images presents challenges due to their complex backgrounds, and the utilization of specific a contextual information can enhance detection accuracy. Inadequate long-range background information may lead to erroneous detection of small remotely sensed objects, with variations in background complexity observed across different object types. In this paper, we propose a new <strong>YOLO</strong>-based real-time object detector. The detector aims to <strong>S</strong>cale-<strong>M</strong>atch the proportions of various objects in remote sensing images using the model named <strong>YOLO-SM</strong>. Specifically, this paper proposes a straightforward yet highly efficient building block that dynamically adjusts the necessary receptive field for each object, minimizing the loss of feature information caused by consecutive convolutions. Additionally, a supplementary bottom-up pathway is incorporated to improve the representation of smaller objects. Empirical evaluations conducted on DOTA-v1.0, DOTA-v1.5, DIOR-R, and HRSC2016 datasets confirm the efficacy of the proposed methodology. On DOTA-v1.0, compared to RTMDet-R-L, YOLO-SM-S achieved competitive accuracy while significantly reducing parameters by 74.8% and FLOPs by 78.5%. Compared to LSKNet on HRSC2016, YOLO-SM-Tiny dramatically reduces 76% of parameters and 90% of FLOPs and improves FPS by about three times while maintaining stable accuracy.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105482"},"PeriodicalIF":4.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning for brain tumor segmentation in multimodal MRI images: A review of methods and advances 深度学习在多模态MRI图像中分割脑肿瘤:方法和进展综述
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-04 DOI: 10.1016/j.imavis.2025.105463
Bin Jiang , Maoyu Liao , Yun Zhao , Gen Li , Siyu Cheng , Xiangkai Wang , Qingling Xia
{"title":"Deep learning for brain tumor segmentation in multimodal MRI images: A review of methods and advances","authors":"Bin Jiang ,&nbsp;Maoyu Liao ,&nbsp;Yun Zhao ,&nbsp;Gen Li ,&nbsp;Siyu Cheng ,&nbsp;Xiangkai Wang ,&nbsp;Qingling Xia","doi":"10.1016/j.imavis.2025.105463","DOIUrl":"10.1016/j.imavis.2025.105463","url":null,"abstract":"<div><h3>Background and Objectives:</h3><div>Image segmentation is crucial in applications like image understanding, feature extraction, and analysis. The rapid development of deep learning techniques in recent years has significantly enhanced the field of medical image processing, with the process of segmenting tumor from MRI images of the brain emerging as a particularly active area of interest within the medical science community. Existing reviews predominantly focus on traditional CNNs and Transformer models but lack systematic analysis and experimental validation on the application of the emerging Mamba architecture in multimodal brain tumor segmentation, the handling of missing modalities, the potential of multimodal fusion strategies, and the heterogeneity of datasets.</div></div><div><h3>Methods:</h3><div>This paper provides a comprehensive literature review of recent deep learning-based methods for multimodal brain tumor segmentation using multimodal MRI images, including performance and quantitative analysis of state-of-the-art approaches. It focuses on the handling of multimodal fusion, adaptation techniques, and missing modality, while also delving into the performance, advantages, and disadvantages of deep learning models such as U-Net, Transformer, hybrid deep learning, and Mamba-based methods in segmentation tasks.</div></div><div><h3>Results:</h3><div>Through the entire review process, It is found that most researchers preferred to use the Transformer-based U-Net model and mamba-based U-Net, especially the fusion model combination of U-Net and mamba, for image segmentation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105463"},"PeriodicalIF":4.2,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dense small target detection algorithm for UAV aerial imagery 无人机航拍图像密集小目标检测算法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-04 DOI: 10.1016/j.imavis.2025.105485
Sheng Lu , Yangming Guo , Jiang Long , Zun Liu , Zhuqing Wang , Ying Li
{"title":"Dense small target detection algorithm for UAV aerial imagery","authors":"Sheng Lu ,&nbsp;Yangming Guo ,&nbsp;Jiang Long ,&nbsp;Zun Liu ,&nbsp;Zhuqing Wang ,&nbsp;Ying Li","doi":"10.1016/j.imavis.2025.105485","DOIUrl":"10.1016/j.imavis.2025.105485","url":null,"abstract":"<div><div>Unmanned aerial vehicle (UAV) aerial images make dense small target detection challenging due to the complex background, small object size in the wide field of view, low resolution, and dense target distribution. Many aerial target detection networks and attention-based methods have been proposed to enhance the capability of dense small target detection, but there are still problems, such as insufficient effective information extraction, missed detection, and false detection of small targets in dense areas. Therefore, this paper proposes a novel dense small target detection algorithm (DSTDA) for UAV aerial images suitable for various high-altitude complex environments. The core component of the proposed DSTDA consists of the multi-axis attention units, the adaptive feature transformation mechanism, and the target-guided sample allocation strategy. Firstly, by introducing the multi-axis attention units into DSTDA, the limitation of DSTDA on global information perception can be addressed. Thus, the detailed features and spatial relationships of small targets at long distances can be sufficiently extracted by our proposed algorithm. Secondly, an adaptive feature transformation mechanism is designed to flexibly adjust the feature map according to the characteristics of the target distribution, which enables the DSTDA to focus more on densely populated target areas. Lastly, a goal-oriented sample allocation strategy is presented, combining coarse screening based on positional information and fine screening guided by target prediction information. By employing this dynamic sample allocation from coarse to fine, the detection performance of small and dense targets in complex backgrounds is further improved. These above innovative improvements empower the DSTDA with enhanced global perception and target-focusing capabilities, effectively addressing the challenges of detecting dense small targets in complex aerial scenes. Experimental validation was conducted on three publicly available datasets: VisDrone, SIMD, and CARPK. The results showed that the proposed DSTDA outperforms other state-of-the-art algorithms in terms of comprehensive performance. The algorithm significantly improves the issues of false alarms and missed detection in drone-based target detection, showcasing remarkable accuracy and real-time performance. It proves to be proficient in the task of detecting dense small targets in drone scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105485"},"PeriodicalIF":4.2,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAN: Distortion-aware Network for fisheye image rectification using graph reasoning 基于图推理的鱼眼图像校正畸变感知网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-03 DOI: 10.1016/j.imavis.2025.105423
Yongjia Yan , Hongzhe Liu , Cheng Zhang , Cheng Xu , Bingxin Xu , Weiguo Pan , Songyin Dai , Yiqing Song
{"title":"DAN: Distortion-aware Network for fisheye image rectification using graph reasoning","authors":"Yongjia Yan ,&nbsp;Hongzhe Liu ,&nbsp;Cheng Zhang ,&nbsp;Cheng Xu ,&nbsp;Bingxin Xu ,&nbsp;Weiguo Pan ,&nbsp;Songyin Dai ,&nbsp;Yiqing Song","doi":"10.1016/j.imavis.2025.105423","DOIUrl":"10.1016/j.imavis.2025.105423","url":null,"abstract":"<div><div>Despite the wide-field view of fisheye images, their application is still hindered by the presentation of distortions. Existing learning-based methods still suffer from artifacts and loss of details, especially at the image edges. To address this, we introduce the Distortion-aware Network (DAN), a novel deep network architecture for fisheye image rectification that leverages graph reasoning. Specifically, we employ the superior relational understanding capability of graph technology to associate distortion patterns in different regions, generating an accurate and globally consistent unwarping flow. Meanwhile, during the image reconstruction process, we utilize deformable convolution to construct same-resolution feature blocks and employ skip connections to supplement the detailed information. Additionally, we introduce a weight decay-based multi-scale loss function, enabling the model to focus more on accuracy at high-resolution layers while enhancing the model’s generalization ability. To address the lack of quantitative evaluation standards for real fisheye images, we propose a new metric called the “Line Preservation Metric.” Through qualitative and quantitative experiments on PLACE365, COCO2017 and real fisheye images, the proposed method proves to outperform existing methods in terms of performance and generalization.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105423"},"PeriodicalIF":4.2,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial cascaded clustering and weighted memory for unsupervised person re-identification 无监督人再识别的空间级联聚类和加权记忆
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-03 DOI: 10.1016/j.imavis.2025.105478
Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang
{"title":"Spatial cascaded clustering and weighted memory for unsupervised person re-identification","authors":"Jiahao Hong,&nbsp;Jialong Zuo,&nbsp;Chuchu Han,&nbsp;Ruochen Zheng,&nbsp;Ming Tian,&nbsp;Changxin Gao,&nbsp;Nong Sang","doi":"10.1016/j.imavis.2025.105478","DOIUrl":"10.1016/j.imavis.2025.105478","url":null,"abstract":"<div><div>Recent advancements in unsupervised person re-identification (re-ID) methods have demonstrated high performance by leveraging fine-grained local context, often referred to as part-based methods. However, many existing part-based methods rely on horizontal division to obtain local contexts, leading to misalignment issues caused by various human poses. Moreover, misalignment of semantic information within part features hampers the effectiveness of metric learning, thereby limiting the potential of part-based methods. These challenges result in under-utilization of part features in existing approaches. To address these issues, we introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the issues of foreground omissions and spatial confusions in previous methods. We then propose foreground and space corrections to enhance the completeness and reasonableness of human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, enabling better utilization of both global and part features. Extensive experiments conducted on Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of the proposed method over numerous state-of-the-art methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105478"},"PeriodicalIF":4.2,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Wire Inpainting via Hierarchical Feature Mixture 通过分层特征混合的视频线绘制
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-03 DOI: 10.1016/j.imavis.2025.105460
Zhong Ji, Yimu Su, Yan Zhang, Shuangming Yang, Yanwei Pang
{"title":"Video Wire Inpainting via Hierarchical Feature Mixture","authors":"Zhong Ji,&nbsp;Yimu Su,&nbsp;Yan Zhang,&nbsp;Shuangming Yang,&nbsp;Yanwei Pang","doi":"10.1016/j.imavis.2025.105460","DOIUrl":"10.1016/j.imavis.2025.105460","url":null,"abstract":"<div><div>Video wire inpainting aims at automatically eliminating visible wires from film footage, significantly streamlining post-production workflows. Previous models address redundancy in wire removal by eliminating redundant blocks to enhance focus on crucial wire details for more accurate reconstruction. However, once redundancy is removed, the disorganized non-redundant blocks disrupt temporal and spatial coherence, making seamless inpainting challenging. The absence of multi-scale feature fusion further limits the model’s ability to handle different wire scales and blend inpainted regions with complex backgrounds. To address these challenges, we propose a Hierarchical Feature Mixture Network (HFM-Net) that integrates two novel modules: a Hierarchical Transformer Module (HTM) and a Spatio-temporal Feature Mixture Module (SFM). Specifically, the HTM employs redundancy-aware attention modules and lightweight transformers to reorganize and fuse key high- and low-dimensional patches. The lightweight transformers are sufficient due to the reduced number of non-redundant blocks processing. By aggregating similar features, these transformers guide the alignment of non-redundant blocks and achieve effective spatio-temporal synchronization. Building on this, the SFM incorporates gated convolutions and GRU to enhance spatial and temporal integration further. Gated convolutions fuse low- and high-dimensional features, while the GRU captures temporal dependencies, enabling seamless inpainting of dynamic wire patterns. Additionally, we introduce a lightweight 3D separable convolution discriminator to improve video quality during the inpainting process while reducing computational costs. Experimental results demonstrate that HFM-Net achieves state-of-the-art performance on the video wire removal task.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105460"},"PeriodicalIF":4.2,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time localization and navigation method for autonomous vehicles based on multi-modal data fusion by integrating memory transformer and DDQN 基于记忆变换器和 DDQN 的多模态数据融合的自动驾驶汽车实时定位和导航方法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-03-02 DOI: 10.1016/j.imavis.2025.105484
Li Zha , Chen Gong , Kunfeng Lv
{"title":"Real-time localization and navigation method for autonomous vehicles based on multi-modal data fusion by integrating memory transformer and DDQN","authors":"Li Zha ,&nbsp;Chen Gong ,&nbsp;Kunfeng Lv","doi":"10.1016/j.imavis.2025.105484","DOIUrl":"10.1016/j.imavis.2025.105484","url":null,"abstract":"<div><div>In the field of autonomous driving, real-time localization and navigation are the core technologies that ensure vehicle safety and precise operation. With advancements in sensor technology and computing power, multi-modal data fusion has become a key method for enhancing the environmental perception capabilities of autonomous vehicles. This study aims to explore a novel visual-language navigation technology to achieve precise navigation of autonomous cars in complex environments. By integrating information from radar, sonar, 5G networks, Wi-Fi, Bluetooth, and a 360-degree visual information collection device mounted on the vehicle's roof, the model fully exploits rich multi-source data. The model uses the Memory Transformer for efficient data encoding and a data fusion strategy with a self-attention network, ensuring a balance between feature integrity and algorithm real-time performance. Furthermore, the encoded data is input into a DDQN vehicle navigation algorithm based on an automatically growing environmental target knowledge graph and large-scale scene maps, enabling continuous learning and optimization in real-world environments. Comparative experiments show that the proposed model outperforms existing SOTA models, particularly in terms of macro-spatial reference from large-scale scene maps, background knowledge support from the automatically growing knowledge graph, and the experience-optimized navigation strategies of the DDQN algorithm. In the comparative experiments with the SOTA models, the proposed model achieved scores of 3.99, 0.65, 0.67, 0.65, 0.63, and 0.63 on the six metrics NE, SR, OSR, SPL, CLS, and DTW, respectively. All of these results significantly enhance the intelligent positioning and navigation capabilities of autonomous driving vehicles.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105484"},"PeriodicalIF":4.2,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信