Image and Vision Computing最新文献

筛选
英文 中文
Cross-set data augmentation for semi-supervised medical image segmentation
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105407
Qianhao Wu , Xixi Jiang , Dong Zhang , Yifei Feng , Jinhui Tang
{"title":"Cross-set data augmentation for semi-supervised medical image segmentation","authors":"Qianhao Wu ,&nbsp;Xixi Jiang ,&nbsp;Dong Zhang ,&nbsp;Yifei Feng ,&nbsp;Jinhui Tang","doi":"10.1016/j.imavis.2024.105407","DOIUrl":"10.1016/j.imavis.2024.105407","url":null,"abstract":"<div><div>Medical image semantic segmentation is a fundamental yet challenging research task. However, training a fully supervised model for this task requires a substantial amount of pixel-level annotated data, which poses a significant challenge for annotators due to the necessity of specialized medical expert knowledge. To mitigate the labeling burden, a semi-supervised medical image segmentation model that leverages both a small quantity of labeled data and a substantial amount of unlabeled data has attracted prominent attention. However, the performance of current methods is constrained by the distribution mismatch problem between limited labeled and unlabeled datasets. To address this issue, we propose a cross-set data augmentation strategy aimed at minimizing the feature divergence between labeled and unlabeled data. Our approach involves mixing labeled and unlabeled data, as well as integrating ground truth with pseudo-labels to produce augmented samples. By employing three distinct cross-set data augmentation strategies, we enhance the diversity of the training dataset and fully exploit the perturbation space. Our experimental results on COVID-19 CT data, spinal cord gray matter MRI data and prostate T2-weighted MRI data substantiate the efficacy of our proposed approach. The code has been released at: <span><span>CDA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105407"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGSAM-Net: UAV route planning and visual guidance model for bridge surface defect detection
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105416
Rongji Li, Ziqian Wang
{"title":"AGSAM-Net: UAV route planning and visual guidance model for bridge surface defect detection","authors":"Rongji Li,&nbsp;Ziqian Wang","doi":"10.1016/j.imavis.2025.105416","DOIUrl":"10.1016/j.imavis.2025.105416","url":null,"abstract":"<div><div>Crack width is a critical indicator of bridge structural health. This paper proposes a UAV-based method for detecting bridge surface defects and quantifying crack width, aiming to improve efficiency and accuracy. The system integrates a UAV with a visual navigation system to capture high-resolution images (7322 × 5102 pixels) and GPS data, followed by image resolution computation and plane correction. For crack detection and segmentation, we introduce AGSAM-Net, a multi-class semantic segmentation network enhanced with attention gating to accurately identify and segment cracks at the pixel level. The system processes 8064 × 6048 pixel images in 2.4 s, with a detection time of 0.5 s per 540 × 540 pixel crack bounding box. By incorporating distance data, the system achieves over 90% accuracy in crack width quantification across multiple datasets. The study also explores potential collaboration with robotic arms, offering new insights into automated bridge maintenance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105416"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-salient object detection with consensus mining and consistency cross-layer interactive decoding
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105414
Yanliang Ge , Jinghuai Pan , Junchao Ren , Min He , Hongbo Bi , Qiao Zhang
{"title":"Co-salient object detection with consensus mining and consistency cross-layer interactive decoding","authors":"Yanliang Ge ,&nbsp;Jinghuai Pan ,&nbsp;Junchao Ren ,&nbsp;Min He ,&nbsp;Hongbo Bi ,&nbsp;Qiao Zhang","doi":"10.1016/j.imavis.2025.105414","DOIUrl":"10.1016/j.imavis.2025.105414","url":null,"abstract":"<div><div>The main goal of co-salient object detection (CoSOD) is to extract a group of notable objects that appear together in the image. The existing methods face two major challenges: the first is that in some complex scenes or in the case of interference by other salient objects, the mining of consensus cues for co-salient objects is inadequate; the second is that other methods input consensus cues from top to bottom into the decoder, which ignores the compactness of the consensus and lacks cross-layer interaction. To solve the above problems, we propose a consensus mining and consistency cross-layer interactive decoding network, called CCNet, which consists of two key components, namely, a consensus cue mining module (CCM) and a consistency cross-layer interactive decoder (CCID). Specifically, the purpose of CCM is to fully mine the cross-consensus clues among the co-salient objects in the image group, so as to achieve the group consistency modeling of the group of images. Furthermore, CCID accepts features of different levels as input and receives semantic information of group consensus from CCM, which is used to guide features of other levels to learn higher-level feature representations and cross-layer interaction of group semantic consensus clues, thereby maintaining the consistency of group consensus cues and enabling accurate co-saliency map prediction. We evaluated the proposed CCNet using four widely accepted metrics across three challenging CoSOD datasets and the experimental results demonstrate that our proposed approach outperforms other existing state-of-the-art CoSOD methods, particularly on the CoSal2015 and CoSOD3k datasets. The results of our method are available at <span><span>https://github.com/jinghuaipan/CCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105414"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transparency and privacy measures of biometric patterns for data processing with synthetic data using explainable artificial intelligence
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105429
Achyut Shankar , Hariprasath Manoharan , Adil O. Khadidos , Alaa O. Khadidos , Shitharth Selvarajan , S.B. Goyal
{"title":"Transparency and privacy measures of biometric patterns for data processing with synthetic data using explainable artificial intelligence","authors":"Achyut Shankar ,&nbsp;Hariprasath Manoharan ,&nbsp;Adil O. Khadidos ,&nbsp;Alaa O. Khadidos ,&nbsp;Shitharth Selvarajan ,&nbsp;S.B. Goyal","doi":"10.1016/j.imavis.2025.105429","DOIUrl":"10.1016/j.imavis.2025.105429","url":null,"abstract":"<div><div>In this paper the need of biometric authentication with synthetic data is analyzed for increasing the security of data in each transmission systems. Since more biometric patterns are represented the complexity of recognition changes where low security features are enabled in transmission process. Hence the process of increasing security is carried out with image biometric patterns where synthetic data is created with explainable artificial intelligence technique thereby appropriate decisions are made. Further sample data is generated at each case thereby all changing representations are minimized with increase in original image set values. Moreover the data flows at each identified biometric patterns are increased where partial decisive strategies are followed in proposed approach. Further more complete interpretabilities that are present in captured images or biometric patterns are reduced thus generated data is maximized to all end users. To verify the outcome of proposed approach four scenarios with comparative performance metrics are simulated where from the comparative analysis it is found that the proposed approach is less robust and complex at a rate of 4% and 6% respectively.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105429"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPDIoU Loss: A loss function for efficient bounding box regression of rotated object detection
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105381
Siliang Ma , Yong Xu
{"title":"FPDIoU Loss: A loss function for efficient bounding box regression of rotated object detection","authors":"Siliang Ma ,&nbsp;Yong Xu","doi":"10.1016/j.imavis.2024.105381","DOIUrl":"10.1016/j.imavis.2024.105381","url":null,"abstract":"<div><div>Bounding box regression is one of the important steps of object detection. However, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. Most of the existing loss functions for rotated object detection calculate the difference between two bounding boxes only focus on the deviation of area or each points distance (e.g., <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>S</mi><mi>m</mi><mi>o</mi><mi>o</mi><mi>t</mi><mi>h</mi><mo>−</mo><mi>L</mi><mn>1</mn></mrow></msub></math></span>, <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>R</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>t</mi><mi>e</mi><mi>d</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span> and <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>P</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span>). The calculation process of some loss functions is extremely complex (e.g. <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>K</mi><mi>F</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span>). In order to improve the efficiency and accuracy of bounding box regression for rotated object detection, we proposed a novel metric for arbitrary shapes comparison based on minimum points distance, which takes most of the factors from existing loss functions for rotated object detection into account, i.e., the overlap or nonoverlapping area, the central points distance and the rotation angle. We also proposed a loss function called <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi><mi>P</mi><mi>D</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span> based on four points distance for accurate bounding box regression focusing on faster and high quality anchor boxes. In the experiments, <span><math><mrow><mi>F</mi><mi>P</mi><mi>D</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></math></span> loss has been applied to state-of-the-art rotated object detection (e.g., RTMDET, H2RBox) models training with three popular benchmarks of rotated object detection including DOTA, DIOR, HRSC2016 and two benchmarks of arbitrary orientation scene text detection including ICDAR 2017 RRC-MLT and ICDAR 2019 RRC-MLT, which achieves better performance than existing loss functions. The code is available at <span><span>https://github.com/JacksonMa618/FPDIoU</span><svg><path></path></svg></span></div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105381"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPLM: Enhancing underwater images with Global Pyramid Linear Modulation
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105361
Jinxin Shao, Haosu Zhang, Jianming Miao
{"title":"GPLM: Enhancing underwater images with Global Pyramid Linear Modulation","authors":"Jinxin Shao,&nbsp;Haosu Zhang,&nbsp;Jianming Miao","doi":"10.1016/j.imavis.2024.105361","DOIUrl":"10.1016/j.imavis.2024.105361","url":null,"abstract":"<div><div>Underwater imagery often suffers from challenges such as color distortion, low contrast, blurring, and noise due to the absorption and scattering of light in water. These degradations complicate visual interpretation and hinder subsequent image processing. Existing methods struggle to effectively address the complex, spatially varying degradations without prior environmental knowledge or may produce unnatural enhancements. To overcome these limitations, we propose a novel method called Global Pyramid Linear Modulation that integrates physical degradation modeling with deep learning for underwater image enhancement. Our approach extends Feature-wise Linear Modulation to a four-dimensional structure, enabling fine-grained, spatially adaptive modulation of feature maps. Our method captures multi-scale contextual information by incorporating a feature pyramid architecture with self-attention and feature fusion mechanisms, effectively modeling complex degradations. We validate our method by integrating it into the MixDehazeNet model and conducting experiments on benchmark datasets. Our approach significantly improves the Peak Signal-to-Noise Ratio, increasing from 28.6 dB to 30.6 dB on the EUVP-515-test dataset. Compared to recent state-of-the-art methods, our method consistently outperforms them by over 3 dB in PSNR on datasets with ground truth. It improves the Underwater Image Quality Measure by more than one on datasets without ground truth. Furthermore, we demonstrate the practical applicability of our method on a real-world underwater dataset, achieving substantial improvements in image quality metrics and visually compelling results. These experiments confirm that our method effectively addresses the limitations of existing techniques by adaptively modeling complex underwater degradations, highlighting its potential for underwater image enhancement tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105361"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IRPE: Instance-level reconstruction-based 6D pose estimator
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105340
Le Jin , Guoshun Zhou , Zherong Liu , Yuanchao Yu , Teng Zhang , Minghui Yang , Jun Zhou
{"title":"IRPE: Instance-level reconstruction-based 6D pose estimator","authors":"Le Jin ,&nbsp;Guoshun Zhou ,&nbsp;Zherong Liu ,&nbsp;Yuanchao Yu ,&nbsp;Teng Zhang ,&nbsp;Minghui Yang ,&nbsp;Jun Zhou","doi":"10.1016/j.imavis.2024.105340","DOIUrl":"10.1016/j.imavis.2024.105340","url":null,"abstract":"<div><div>The estimation of an object’s 6D pose is a fundamental task in modern commercial and industrial applications. Vision-based pose estimation has gained popularity due to its cost-effectiveness and ease of setup in the field. However, this type of estimation tends to be less robust compared to other methods due to its sensitivity to the operating environment. For instance, in robot manipulation applications, heavy occlusion and clutter are common, posing significant challenges. For safety and robustness in industrial environments, depth information is often leveraged instead of relying solely on RGB images. Nevertheless, even with depth information, 6D pose estimation in such scenarios still remains challenging. In this paper, we introduce a novel 6D pose estimation method that promotes the network’s learning of high-level object features through self-supervised learning and instance reconstruction. The feature representation of the reconstructed instance is subsequently utilized in direct 6D pose regression via a multi-task learning scheme. As a result, the proposed method can differentiate and retrieve each object instance from a scene that is heavily occluded and cluttered, thereby surpassing conventional pose estimators in such scenarios. Additionally, due to the standardized prediction of reconstructed image, our estimator exhibits robustness performance against variations in lighting conditions and color drift. This is a significant improvement over traditional methods that depend on pixel-level sparse or dense features. We demonstrate that our method achieves state-of-the-art performance (e.g., 85.4% on LM-O) on the most commonly used benchmarks with respect to the ADD(-S) metric. Lastly, we present a CLIP dataset that emulates intense occlusion scenarios of industrial environment and conduct a real-world experiment for manipulation applications to verify the effectiveness and robustness of our proposed method.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105340"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CFENet: Context-aware Feature Enhancement Network for efficient few-shot object counting
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105383
Shihui Zhang , Gangzheng Zhai , Kun Chen , Houlin Wang , Shaojie Han
{"title":"CFENet: Context-aware Feature Enhancement Network for efficient few-shot object counting","authors":"Shihui Zhang ,&nbsp;Gangzheng Zhai ,&nbsp;Kun Chen ,&nbsp;Houlin Wang ,&nbsp;Shaojie Han","doi":"10.1016/j.imavis.2024.105383","DOIUrl":"10.1016/j.imavis.2024.105383","url":null,"abstract":"<div><div>Few-shot object counting (FSOC) is designed to estimate the number of objects in any category given a query image and several bounding boxes. Existing methods usually ignore shape information when extracting the appearance of exemplars from query images, resulting in reduced object localization accuracy and count estimates. Meanwhile, these methods also utilize a fixed inner product or convolution for similarity matching, which may introduce background interference and limit the matching of objects with significant intra-class differences. To address the above challenges, we propose a Context-aware Feature Enhancement Network (CFENet) for FSOC. Specifically, our network comprises three main modules: Hierarchical Perception Joint Enhancement Module (HPJEM), Learnable Similarity Matcher (LSM), and Feature Fusion Module (FFM). Firstly, HPJEM performs feature enhancement on the scale transformations of query images and the shapes of exemplars, improving the network’s ability to recognize dense objects. Secondly, LSM utilizes learnable dilated convolutions and linear layers to expand the similarity metric of a fixed inner product, obtaining similarity maps. Then convolution with a given kernel is performed on the similarity maps to get the weighted features. Finally, FFM further fuses weighted features with multi-scale features obtained by HPJEM. We conduct extensive experiments on the specialized few-shot dataset FSC-147 and the subsets Val-COCO and Test-COCO of the COCO dataset. Experimental results validate the effectiveness of our method and show competitive performance. To further verify the generalization of CFENet, we also conduct experiments on the car dataset CARPK.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105383"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge guided and Fourier attention-based Dual Interaction Network for scene text erasing
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105406
Ran Gong, Anna Zhu, Kun Liu
{"title":"Edge guided and Fourier attention-based Dual Interaction Network for scene text erasing","authors":"Ran Gong,&nbsp;Anna Zhu,&nbsp;Kun Liu","doi":"10.1016/j.imavis.2024.105406","DOIUrl":"10.1016/j.imavis.2024.105406","url":null,"abstract":"<div><div>Scene text erasing (STE) aims to remove the text regions and inpaint those regions with reasonable content in the image. It involves a potential task, i.e., scene text segmentation, in implicate or explicate ways. Most previous methods used cascaded or parallel pipelines to segment text in one branch and erase text in another branch. However, they have not fully explored the information between the two subtasks, i.e., using an interactive method to enhance each other. In this paper, we introduce a novel one-stage STE model called Dual Interaction Network (DINet), which encourages interaction between scene text segmentation and scene text erasing in an end-to-end manner. DINet adopts a shared encoder and two parallel decoders for text segmentation and erasing respectively. Specifically, the two decoders interact via an Interaction Enhancement Module (IEM) in each layer, aggregating the residual information from each other. To facilitate effective and efficient mutual enhancement between the dual tasks, we propose a novel Fourier Transform-based Attention Module (FTAM). In addition, we incorporate an Edge-Guided Module (EGM) into the text segmentation branch to better erase the text boundary regions and generate natural-looking images. Extensive experiments demonstrate that the DINet achieves state-of-the-art performances on several benchmarks. Furthermore, the ablation studies indicate the effectiveness and efficiency of our proposed modules in DINet.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105406"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
1D kernel distillation network for efficient image super-resolution
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105411
Yusong Li, Longwei Xu, Weibin Yang, Dehua Geng, Mingyuan Xu, Zhiqi Dong, Pengwei Wang
{"title":"1D kernel distillation network for efficient image super-resolution","authors":"Yusong Li,&nbsp;Longwei Xu,&nbsp;Weibin Yang,&nbsp;Dehua Geng,&nbsp;Mingyuan Xu,&nbsp;Zhiqi Dong,&nbsp;Pengwei Wang","doi":"10.1016/j.imavis.2024.105411","DOIUrl":"10.1016/j.imavis.2024.105411","url":null,"abstract":"<div><div>Recently, there have been significant strides in single-image super-resolution, especially with the integration of transformers. However, the escalating computational demands of large models pose challenges for deployment on edge devices. Therefore, in pursuit of Efficient Image Super-Resolution (EISR), achieving a better balance between task computational complexity and image fidelity becomes imperative. In this paper, we introduce the 1D kernel distillation network (OKDN). Within this network, we have devised a lightweight 1D Large Kernel (OLK) block, incorporating a more lightweight yet highly effective attention mechanism. This block significantly expands the effective receptive field, enhancing performance while mitigating computational costs. Additionally, we develop a Channel Shift Enhanced Distillation (CSED) block to improve distillation efficiency, allocating more computational resources towards increasing network depth. We utilize methods involving partial channel shifting and global feature supervision (GFS) to further augment the effective receptive field. Furthermore, we introduce learnable Gaussian perturbation convolution (LGPConv) to enhance the model’s feature extraction and performance capabilities while upholding low computational complexity. Experimental results demonstrate that our proposed approach achieves superior results with significantly lower computational complexity compared to state-of-the-art models. The code is available at <span><span>https://github.com/satvio/OKDN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105411"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信