Image and Vision Computing最新文献

筛选
英文 中文
CPFSSR: Combined permuted self-attention and fast Fourier transform-based network for stereo image super-resolution CPFSSR:结合排列自注意和快速傅立叶变换的立体图像超分辨率网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-04-01 Epub Date: 2026-01-13 DOI: 10.1016/j.imavis.2025.105870
Wenwu Luo , Jing Wu , Feng Huang, Yunxiang Li
{"title":"CPFSSR: Combined permuted self-attention and fast Fourier transform-based network for stereo image super-resolution","authors":"Wenwu Luo ,&nbsp;Jing Wu ,&nbsp;Feng Huang,&nbsp;Yunxiang Li","doi":"10.1016/j.imavis.2025.105870","DOIUrl":"10.1016/j.imavis.2025.105870","url":null,"abstract":"<div><div>The pursuit of high-fidelity stereo image super-resolution (SR) is paramount for 3D vision applications. However, existing Transformer-based methods often suffer from high computational complexity and limited effectiveness in capturing long-range cross-view dependencies. To address these issues, we propose a combined permuted self-attention and fast Fourier transform-based network for stereo image SR (CPFSSR), a novel network that combines a permuted Swin Fourier Transformer block (PSFTB) with a deep cross-attention module (DCAM) to tackle these dual challenges. The PSFTB employs a permuted self-attention mechanism and fast Fourier convolution to achieve global receptive fields with linear computational complexity, and captures intra-view contextual details. For better fusion, a DCAM enables adaptive hierarchical interaction between views. In addition, we propose a spatial frequency reinforcement block (SFRB) to enhance the extraction of complex frequency information using fast Fourier convolution. Rigorous evaluation of benchmarks shows that CPFSSR sets a new state-of-the-art, outperforming existing methods by an average on the Flickr1024, Middlebury, KITTI2012, and KITTI2015 datasets. Visual assessments also confirm its superiority in reconstructing fine natural textures with minimal artifacts. The proposed method achieves a trade-off between parametric and stereo image SR task performance and is suitable for accurate high-resolution image reconstruction. The source code is available at <span><span>https://github.com/Flt-Flag/CPFSSR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"168 ","pages":"Article 105870"},"PeriodicalIF":4.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNN-CECA: Underwater image enhancement via CNN-driven nonlinear curve estimation and channel-wise attention in multi-color spaces CNN-CECA:基于cnn驱动的非线性曲线估计和多色空间的信道关注的水下图像增强
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2026-01-26 DOI: 10.1016/j.imavis.2026.105916
Imran Afzal, Guo Jichang, Fazeela Siddiqui, Muhammad Fahad
{"title":"CNN-CECA: Underwater image enhancement via CNN-driven nonlinear curve estimation and channel-wise attention in multi-color spaces","authors":"Imran Afzal,&nbsp;Guo Jichang,&nbsp;Fazeela Siddiqui,&nbsp;Muhammad Fahad","doi":"10.1016/j.imavis.2026.105916","DOIUrl":"10.1016/j.imavis.2026.105916","url":null,"abstract":"<div><div>High-quality underwater images are essential for marine exploration, environmental monitoring, and scientific analysis. However, they are degraded by light attenuation, scattering, and wavelength-dependent absorption, which cause color shifts, low contrast, and detail loss. Furthermore, many existing deep learning techniques function as black boxes, offering limited interpretability and often generalizing poorly across diverse underwater conditions. To address this, we propose CNN-CECA, a novel deep learning framework whose core innovation is the hybrid integration of a convolutional backbone with physically-inspired, non-linear curve estimation across multiple color spaces. A lightweight CNN adjusts brightness, contrast, and color balance, and ResNet-50 guides the analysis of polynomial, sigmoid, and exponential curves in RGB, HSV, and CIELab, enabling both global and local adaptation. A key component is our novel Triple Channel-wise Attention (TCA) module, which fuses results across the three color spaces, dynamically allocating weights to recover natural colors and delicate structures. Post-processing with contrast stretching and edge sharpening adds final refinement while preserving efficiency for real-time use. Extensive experiments on synthetic and real-world datasets (e.g., UIEB, UCCS, EUVP, and NYU-v2) demonstrate superior quantitative scores and visually faithful restorations compared with traditional and state-of-the-art methods. Ablation studies verify the contributions of curve estimation and attention. This interpretable and adaptive approach offers a robust, scalable, and efficient solution for underwater image enhancement and is broadly applicable to vision tasks supporting autonomous platforms and human operators. The approach generalizes well across scenes and varying water conditions globally.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105916"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating spatial features and dynamically learned temporal features via contrastive learning for video temporal grounding in LLM 基于对比学习的LLM视频时间基础空间特征与动态学习时间特征集成
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2026-01-05 DOI: 10.1016/j.imavis.2026.105895
Peifu Wang , Yixiong Liang , Yigang Cen , Lihui Cen , Zhe Qu , Jingling Liu , Shichao Kan
{"title":"Integrating spatial features and dynamically learned temporal features via contrastive learning for video temporal grounding in LLM","authors":"Peifu Wang ,&nbsp;Yixiong Liang ,&nbsp;Yigang Cen ,&nbsp;Lihui Cen ,&nbsp;Zhe Qu ,&nbsp;Jingling Liu ,&nbsp;Shichao Kan","doi":"10.1016/j.imavis.2026.105895","DOIUrl":"10.1016/j.imavis.2026.105895","url":null,"abstract":"<div><div>Video temporal grounding (VTG) is crucial for fine-grained temporal understanding in vision-language tasks. While large vision-language models (LVLMs) have shown promising results through image–text alignment and video-instruction tuning, they represent videos as static sequences of sampled frames processed by image-based vision encoders, inherently limiting their capacity to capture dynamic and sequential information effectively, leading to suboptimal performance. To address this, we propose integrating spatial features with dynamically learned temporal features using contrastive learning. Temporal features are dynamically extracted by learning a set of temporal query tokens, which prompt temporal feature extraction via contrastive alignment between video sequences and their corresponding descriptions. On the other hand, VTG based on large language models are always supervised solely through the language modeling loss, which is insufficient for effectively guiding such tasks. Thus, the VTG model in our method is trained with a temporal localization loss that combines mean squared error (MSE), intersection-over-union (IoU) of the temporal range, and cosine similarity of temporal embeddings, which is designed to be applicable to large language models. Our experiments on benchmark datasets demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105895"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HDD-Unet: A Unet-based architecture for low-light image enhancement HDD-Unet:用于弱光图像增强的基于unet的架构
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2025-12-24 DOI: 10.1016/j.imavis.2025.105889
Elissavet Batziou , Konstantinos Ioannidis , Ioannis Patras , Stefanos Vrochidis , Ioannis Kompatsiaris
{"title":"HDD-Unet: A Unet-based architecture for low-light image enhancement","authors":"Elissavet Batziou ,&nbsp;Konstantinos Ioannidis ,&nbsp;Ioannis Patras ,&nbsp;Stefanos Vrochidis ,&nbsp;Ioannis Kompatsiaris","doi":"10.1016/j.imavis.2025.105889","DOIUrl":"10.1016/j.imavis.2025.105889","url":null,"abstract":"<div><div>Low-light imaging has become a popular topic in image processing, with the quality enhancement of low light images being as a significant challenge, due to the difficulty in retaining colors, patterns, texture and style when generating a normal light image. Our objectives are mainly to firstly better preserve texture regions in image enhancement, while, secondly, preserving colors via color histogram blocks and, finally, to enhance the quality of image through dense denoising blocks. Our proposed novel framework, namely HDD-Unet, is a double Unet based on photorealistic style transfer for low-light image enhancement. The proposed low-light image enhancement method combines color histogram-based fusion, Haar wavelet pooling, dense-denoising blocks and U-net as a backbone architecture to enhance the contrast, reduce noise, and improve the visibility of low light images. Experimental results demonstrate that our proposed method outperforms existing methods in terms of PSNR and SSIM quantitative evaluation metrics, reaching or outperforming state-of-the-art accuracy, but with less resources. We also conduct an ablation study to investigate the impact of our approach on overexposed images, and systematic analysis on the late fusion weighting parameters. Multiple experiments were conducted with artificial noise inserted to accomplish more efficient comparison. The results show that the proposed framework enhances accurately images with various gamma corrections. The proposed method represents a significant advance in the field of low light image enhancement and has the potential to address several challenges associated with low light imaging.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105889"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical texture-aware image inpainting via contextual attention and multi-scale fusion 基于上下文关注和多尺度融合的分层纹理感知图像绘画
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2025-12-16 DOI: 10.1016/j.imavis.2025.105875
Runing Li , Jiangyan Dai , Qibing Qin , Chengduan Wang , Yugen Yi , Jianzhong Wang
{"title":"Hierarchical texture-aware image inpainting via contextual attention and multi-scale fusion","authors":"Runing Li ,&nbsp;Jiangyan Dai ,&nbsp;Qibing Qin ,&nbsp;Chengduan Wang ,&nbsp;Yugen Yi ,&nbsp;Jianzhong Wang","doi":"10.1016/j.imavis.2025.105875","DOIUrl":"10.1016/j.imavis.2025.105875","url":null,"abstract":"<div><div>Image inpainting aims to restore missing regions in images with visually coherent and semantically plausible content. Although deep learning methods have achieved significant progress, current approaches still face challenges in handling large-area image inpainting tasks, often producing blurred textures or structurally inconsistent results. These limitations primarily stem from the insufficient exploitation of long-range dependencies and inadequate texture priors. To address these issues, we propose a novel two-stage image inpainting framework that integrates multi-directional texture priors with contextual information. In the first stage, we extract rich texture features from corrupted images using Gabor filters, which simulate human visual perception. These features are then fused to guide a texture inpainting network, where a Multi-Scale Dense Skip Connection (MSDSC) module is introduced to bridge semantic gaps across different feature levels. In the second stage, we design a hierarchical texture-aware guided image completion network that utilizes the repaired textures as auxiliary guidance. Specifically, a contextual attention module is incorporated to capture long-range spatial dependencies and enhance structural consistency. Extensive experiments conducted on three challenging benchmarks, such as CelebA-HQ, Places2, and Paris Street View, demonstrate that our method outperforms existing state-of-the-art approaches in both quantitative metrics and visual quality. The proposed framework significantly improves the realism and coherence of inpainting results, particularly for images with large missing regions or complex textures. The code is available at <span><span>https://github.com/Runing-Lab/HTA2I.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105875"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LoRA-empowered efficient diffusion for accurate fine-grained detail rendering in real-image cartoonization 基于lora的高效扩散,在实景图像卡通化中实现精确的细粒度细节渲染
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2026-01-06 DOI: 10.1016/j.imavis.2026.105898
Mingjin Liu , Yien Li
{"title":"LoRA-empowered efficient diffusion for accurate fine-grained detail rendering in real-image cartoonization","authors":"Mingjin Liu ,&nbsp;Yien Li","doi":"10.1016/j.imavis.2026.105898","DOIUrl":"10.1016/j.imavis.2026.105898","url":null,"abstract":"<div><div>Recent advances in generative models have enabled diverse applications, from text-to-image synthesis to artistic content creation. However, generating high-quality, domain-specific content — particularly for culturally unique styles like Chinese opera — remains challenging due to limited generalization on long-tail data and the high cost of fine-tuning with specialized datasets. To address these limitations, we propose DreamOpera, a novel framework for transforming real-world Chinese opera character photographs into stylized cartoon representations. Our approach leverages a two-step process: (1) feature extraction using a pre-trained encoder to capture key visual attributes (e.g., clothing, facial features), and (2) domain transformation via a LoRA-fine-tuned diffusion model trained on a small, unpaired dataset of cartoon-style opera images. This strategy bypasses the need for costly paired data while preserving fine-grained details. Experiments demonstrate that DreamOpera outperforms existing methods in generating high-fidelity, culturally nuanced artwork, offering practical value for cultural dissemination and digital art.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105898"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DRM-YOLO: A YOLOv11-based structural optimization method for small object detection in UAV aerial imagery DRM-YOLO:一种基于yolov11的无人机航拍小目标检测结构优化方法
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2025-12-30 DOI: 10.1016/j.imavis.2025.105894
Hongbo Bi, Rui Dai, Fengyang Han, Cong Zhang
{"title":"DRM-YOLO: A YOLOv11-based structural optimization method for small object detection in UAV aerial imagery","authors":"Hongbo Bi,&nbsp;Rui Dai,&nbsp;Fengyang Han,&nbsp;Cong Zhang","doi":"10.1016/j.imavis.2025.105894","DOIUrl":"10.1016/j.imavis.2025.105894","url":null,"abstract":"<div><div>With the falling cost of UAVs and advances in automation, drones are increasingly applied in agriculture, inspection, and smart cities. However, small object detection remains difficult due to tiny targets, sparse features, and complex backgrounds. To tackle these challenges, this paper presents an improved small object detection framework for UAV imagery, optimized from the YOLOv11n architecture. First, the proposed MetaDWBlock integrates multi-branch depthwise separable convolutions with a lightweight MLP, and its hierarchical MetaDWStage enhances contextual and fine-grained feature modeling. Second, the Cross-scale Feature Fusion Module (CFFM) employs the CARAFE upsampling operator for precise fusion of shallow spatial and deep semantic features, improving multi-scale perception. Finally, a scale-, spatial-, and task-aware Dynamic Head with an added P2 branch forms a four-branch detection head, markedly boosting detection accuracy for tiny objects. Experimental results on the VisDrone2019 dataset demonstrate that the proposed DRM-YOLO model significantly outperforms the baseline YOLOv11n in small object detection tasks, achieving a 21.4% improvement in [email protected] and a 13.1% improvement in [email protected]. These results fully validate the effectiveness and practical value of the proposed method in enhancing the accuracy and robustness of small object detection in UAV aerial imagery. The code and results of our method are available at <span><span>https://github.com/DRdairuiDR/DRM--YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105894"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OCC-MLLM-CoT: Self-correction enhanced occlusion recognition with large language models via 3D-aware supervision, chain-of-thoughts guidance occ - mlm - cot:通过3d感知监督、思维链引导,对大型语言模型进行自我纠错增强的遮挡识别
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2025-12-24 DOI: 10.1016/j.imavis.2025.105881
Chaoyi Wang , Fangzhou Meng , Jun Pei , Lijie Xia , Jianpo Liu , Xiaobing Yuan , Xinhan Di
{"title":"OCC-MLLM-CoT: Self-correction enhanced occlusion recognition with large language models via 3D-aware supervision, chain-of-thoughts guidance","authors":"Chaoyi Wang ,&nbsp;Fangzhou Meng ,&nbsp;Jun Pei ,&nbsp;Lijie Xia ,&nbsp;Jianpo Liu ,&nbsp;Xiaobing Yuan ,&nbsp;Xinhan Di","doi":"10.1016/j.imavis.2025.105881","DOIUrl":"10.1016/j.imavis.2025.105881","url":null,"abstract":"<div><div>Comprehending occluded objects remains an underexplored challenge for existing large-scale visual–language multi-modal models. Current state-of-the-art multi-modal large models struggle to provide satisfactory performance in comprehending occluded objects despite using universal visual encoders and supervised learning strategies. To address this limitation, we propose OCC-MLLM-CoT, a multi-modal large vision–language framework that integrates 3D-aware supervision with Chain-of-Thoughts reasoning. Our approach consists of three key components: (1) a comprehensive framework combining a large multi-modal vision–language model with a specialized 3D reconstruction expert model; (2) a multi-modal Chain-of-Thoughts mechanism trained through both supervised and reinforcement learning strategies, enabling the model to develop advanced reasoning and self-reflection capabilities; and (3) a novel large-scale dataset containing 110,000 samples of occluded objects held in hand, specifically designed for multi-modal chain-of-thoughts reasoning. Experimental evaluations demonstrate that our proposed method achieves an 11.14% improvement in decision score, increasing from 0.6412 to 0.7526 compared to state-of-the-art multi-modal large language models.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105881"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object-level semantic alignment for enhancing fidelity in text-to-image generation with diffusion models 用扩散模型增强文本到图像生成的保真度的对象级语义对齐
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2026-01-29 DOI: 10.1016/j.imavis.2026.105923
Wenna Liu , Na Tian , Youjia Shao , Wencang Zhao
{"title":"Object-level semantic alignment for enhancing fidelity in text-to-image generation with diffusion models","authors":"Wenna Liu ,&nbsp;Na Tian ,&nbsp;Youjia Shao ,&nbsp;Wencang Zhao","doi":"10.1016/j.imavis.2026.105923","DOIUrl":"10.1016/j.imavis.2026.105923","url":null,"abstract":"<div><div>Text-to-image diffusion models have achieved remarkable success in generating diverse images. However, these models still face the challenge of semantic misalignment when handling text prompts containing multiple entities and attributes. It leads to object omission and attribute confusion, which impacts image fidelity. Inspired by the object-oriented structure implicit in the text prompt, we treat the text prompt as an organic system composed of objects, attributes and their interrelationships, aiming to unveil the underlying logic and semantic connections. We propose an object-centered attention map alignment method guided by the text’s syntactic structure to address the aforementioned issues. Firstly, we dynamically integrate textual semantic information through syntactic parsing and attention mechanisms, ensuring the model fully understands the prompt’s content. Then, we leverage fine-grained semantic-guided entity mask generation to accurately locate the target objects and alleviate the issue of object omission. Finally, we design a novel object-centric dual-loss binding function. The positive loss reinforces the association between objects and their attributes, while the negative loss mitigates the interference of irrelevant information, ensuring precise matching between objects and their attributes. Extensive experiments on the ABC-6K and AnE datasets demonstrate that the generated images confirm the model’s ability to accurately produce the objects and their corresponding visual attributes, further validating the effectiveness and superiority of our method.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105923"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-stage network combining transformer and hybrid convolutions for stereo image super-resolution 结合变压器和混合卷积的双级网络立体图像超分辨率
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2026-03-01 Epub Date: 2025-12-29 DOI: 10.1016/j.imavis.2025.105892
Jintao Zeng , Aiwen Jiang , Feiqiang Liu
{"title":"Dual-stage network combining transformer and hybrid convolutions for stereo image super-resolution","authors":"Jintao Zeng ,&nbsp;Aiwen Jiang ,&nbsp;Feiqiang Liu","doi":"10.1016/j.imavis.2025.105892","DOIUrl":"10.1016/j.imavis.2025.105892","url":null,"abstract":"<div><div>Stereo image super-resolution aims to recover high-resolution image from given low-resolution left and right view images. Its challenges lie in fully feature extraction on each perspective and skillfully information integration from different perspectives. Among current methods, almost all super-resolution models employ single-stage strategy either based on transformer or convolution neural network(CNN). For highly nonlinear problems, single-stage network may not achieve very ideal performance with acceptable complexity. In this paper, we have proposed a dual-stage stereo image super-resolution network (DSSRNet) which integrates the complementary advantages of transformer and convolutions. Specifically, we design cross-stage attention module (CASM) to bridge informative feature transmission between successive stages. Moreover, we utilize fourier convolutions to efficiently model global and local features, benefiting restoring image details and texture. We have compared the proposed DSSRNet with several state-of-the-art methods on public benchmark datasets. The comprehensive experiments demonstrate that DSSRNet can restore clear structural features and richer texture details, achieving leading performance on PSNR, SSIM and LPIPS metrics with acceptable computation burden in stereo image super-resolution field. Related source codes and models will be released on <span><span>https://github.com/Zjtao-lab/DSSRNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"167 ","pages":"Article 105892"},"PeriodicalIF":4.2,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书