IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

筛选
英文 中文
Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression 探索基于学习提升的变换结构,实现完全可扩展、可访问的小波图像压缩
Xinyue Li;Aous Naman;David Taubman
{"title":"Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression","authors":"Xinyue Li;Aous Naman;David Taubman","doi":"10.1109/TIP.2024.3482877","DOIUrl":"10.1109/TIP.2024.3482877","url":null,"abstract":"This paper provides a comprehensive study on features and performance of different ways to incorporate neural networks into lifting-based wavelet-like transforms, within the context of fully scalable and accessible image compression. Specifically, we explore different arrangements of lifting steps, as well as various network architectures for learned lifting operators. Moreover, we examine the impact of the number of learned lifting steps, the number of channels, the number of layers and the support of kernels in each learned lifting operator. To facilitate the study, we investigate two generic training methodologies that are simultaneously appropriate to a wide variety of lifting structures considered. Experimental results ultimately suggest that retaining fixed lifting steps from the base wavelet transform is highly beneficial. Moreover, we demonstrate that employing more learned lifting steps and more layers in each learned lifting operator do not contribute strongly to the compression performance. However, benefits can be obtained by utilizing more channels in each learned lifting operator. Ultimately, the learned wavelet-like transform proposed in this paper achieves over 25% bit-rate savings compared to JPEG 2000 with compact spatial support.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6173-6188"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142488358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bi-Directionally Fused Boundary Aware Network for Skin Lesion Segmentation 用于皮损分割的双向融合边界感知网络
Feiniu Yuan;Yuhuan Peng;Qinghua Huang;Xuelong Li
{"title":"A Bi-Directionally Fused Boundary Aware Network for Skin Lesion Segmentation","authors":"Feiniu Yuan;Yuhuan Peng;Qinghua Huang;Xuelong Li","doi":"10.1109/TIP.2024.3482864","DOIUrl":"10.1109/TIP.2024.3482864","url":null,"abstract":"It is quite challenging to visually identify skin lesions with irregular shapes, blurred boundaries and large scale variances. Convolutional Neural Network (CNN) extracts more local features with abundant spatial information, while Transformer has the powerful ability to capture more global information but with insufficient spatial details. To overcome the difficulties in discriminating small or blurred skin lesions, we propose a Bi-directionally Fused Boundary Aware Network (BiFBA-Net). To utilize complementary features produced by CNNs and Transformers, we design a dual-encoding structure. Different from existing dual-encoders, our method designs a Bi-directional Attention Gate (Bi-AG) with two inputs and two outputs for crosswise feature fusion. Our Bi-AG accepts two kinds of features from CNN and Transformer encoders, and two attention gates are designed to generate two attention outputs that are sent back to the two encoders. Thus, we implement adequate exchanging of multi-scale information between CNN and Transformer encoders in a bi-directional and attention way. To perfectly restore feature maps, we propose a progressive decoding structure with boundary aware, containing three decoders with six supervised losses. The first decoder is a CNN network for producing more spatial details. The second one is a Partial Decoder (PD) for aggregating high-level features with more semantics. The last one is a Boundary Aware Decoder (BAD) proposed to progressively improve boundary accuracy. Our BAD uses residual structure and Reverse Attention (RA) at different scales to deeply mine structural and spatial details for refining lesion boundaries. Extensive experiments on public datasets show that our BiFBA-Net achieves higher segmentation accuracy, and has much better ability of boundary perceptions than compared methods. It also alleviates both over-segmentation of small lesions and under-segmentation of large ones.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6340-6353"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142488442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EviPrompt: A Training-Free Evidential Prompt Generation Method for Adapting Segment Anything Model in Medical Images EviPrompt:用于调整医学影像中分段模型的免训练证据提示生成方法
Yinsong Xu;Jiaqi Tang;Aidong Men;Qingchao Chen
{"title":"EviPrompt: A Training-Free Evidential Prompt Generation Method for Adapting Segment Anything Model in Medical Images","authors":"Yinsong Xu;Jiaqi Tang;Aidong Men;Qingchao Chen","doi":"10.1109/TIP.2024.3482175","DOIUrl":"10.1109/TIP.2024.3482175","url":null,"abstract":"Medical image segmentation is a critical task in clinical applications. Recently, the Segment Anything Model (SAM) has demonstrated potential for natural image segmentation. However, the requirement for expert labour to provide prompts, and the domain gap between natural and medical images pose significant obstacles in adapting SAM to medical images. To overcome these challenges, this paper introduces a novel prompt generation method named EviPrompt. The proposed method requires only a single reference image-annotation pair, making it a training-free solution that significantly reduces the need for extensive labelling and computational resources. First, prompts are automatically generated based on the similarity between features of the reference and target images, and evidential learning is introduced to improve reliability. Then, to mitigate the impact of the domain gap, committee voting and inference-guided in-context learning are employed, generating prompts primarily based on human prior knowledge and reducing reliance on extracted semantic information. EviPrompt represents an efficient and robust approach to medical image segmentation. We evaluate it across a broad range of tasks and modalities, confirming its efficacy. The source code is available at \u0000<uri>https://github.com/SPIresearch/EviPrompt</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6204-6215"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable Training Strategy for Blind Multi-Distribution Noise Removal 多分布盲法噪声消除的可扩展训练策略
Kevin Zhang;Sakshum Kulshrestha;Christopher A. Metzler
{"title":"A Scalable Training Strategy for Blind Multi-Distribution Noise Removal","authors":"Kevin Zhang;Sakshum Kulshrestha;Christopher A. Metzler","doi":"10.1109/TIP.2024.3482185","DOIUrl":"10.1109/TIP.2024.3482185","url":null,"abstract":"Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g., removing Poisson noise) for performance at another (e.g., removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e., the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6216-6226"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Coding Inspired LSTM and Self-Attention Integration for Medical Image Segmentation 稀疏编码启发的 LSTM 和自注意力集成用于医学图像分割
Zexuan Ji;Shunlong Ye;Xiao Ma
{"title":"Sparse Coding Inspired LSTM and Self-Attention Integration for Medical Image Segmentation","authors":"Zexuan Ji;Shunlong Ye;Xiao Ma","doi":"10.1109/TIP.2024.3482189","DOIUrl":"10.1109/TIP.2024.3482189","url":null,"abstract":"Accurate and automatic segmentation of medical images plays an essential role in clinical diagnosis and analysis. It has been established that integrating contextual relationships substantially enhances the representational ability of neural networks. Conventionally, Long Short-Term Memory (LSTM) and Self-Attention (SA) mechanisms have been recognized for their proficiency in capturing global dependencies within data. However, these mechanisms have typically been viewed as distinct modules without a direct linkage. This paper presents the integration of LSTM design with SA sparse coding as a key innovation. It uses linear combinations of LSTM states for SA’s query, key, and value (QKV) matrices to leverage LSTM’s capability for state compression and historical data retention. This approach aims to rectify the shortcomings of conventional sparse coding methods that overlook temporal information, thereby enhancing SA’s ability to do sparse coding and capture global dependencies. Building upon this premise, we introduce two innovative modules that weave the SA matrix into the LSTM state design in distinct manners, enabling LSTM to more adeptly model global dependencies and meld seamlessly with SA without accruing extra computational demands. Both modules are separately embedded into the U-shaped convolutional neural network architecture for handling both 2D and 3D medical images. Experimental evaluations on downstream medical image segmentation tasks reveal that our proposed modules not only excel on four extensively utilized datasets across various baselines but also enhance prediction accuracy, even on baselines that have already incorporated contextual modules. Code is available at \u0000<uri>https://github.com/yeshunlong/SALSTM</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6098-6113"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection 利用基于子群的正对选择提高噪声抑制深度度量学习中的样本利用率
Zhipeng Yu;Qianqian Xu;Yangbangyan Jiang;Yingfei Sun;Qingming Huang
{"title":"Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection","authors":"Zhipeng Yu;Qianqian Xu;Yangbangyan Jiang;Yingfei Sun;Qingming Huang","doi":"10.1109/TIP.2024.3482182","DOIUrl":"10.1109/TIP.2024.3482182","url":null,"abstract":"The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6083-6097"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latitude-Redundancy-Aware All-Zero Block Detection for Fast 360-Degree Video Coding 面向快速 360 度视频编码的纬度冗余感知全零块检测
Chang Yu;Xiaopeng Fan;Pengjin Chen;Yuxin Ni;Hengyu Man;Debin Zhao
{"title":"Latitude-Redundancy-Aware All-Zero Block Detection for Fast 360-Degree Video Coding","authors":"Chang Yu;Xiaopeng Fan;Pengjin Chen;Yuxin Ni;Hengyu Man;Debin Zhao","doi":"10.1109/TIP.2024.3482172","DOIUrl":"10.1109/TIP.2024.3482172","url":null,"abstract":"The sphere-to-plane projection of 360-degree video introduces substantial stretched redundant data, which is discarded when reprojected to the 3D sphere for display. Consequently, encoding and transmitting such redundant data is unnecessary. Highly redundant blocks can be referred to as all-zero blocks (AZBs). Detecting these AZBs in advance can reduce computational and transmission resource consumption. However, this cannot be achieved by existing AZB detection techniques due to the unawareness of the stretching redundancy. In this paper, we first derive a latitude-adaptive redundancy detection (LARD) approach to adaptively detect coefficients carrying redundancy in transformed blocks by modeling the dependency between valid frequency range and the stretching degree based on spectrum analysis. Utilizing LARD, a latitude-redundancy-aware AZB detection scheme tailored for fast 360-degree video coding (LRAS) is proposed to accelerate the encoding process. LRAS consists of three sequential stages: latitude-adaptive AZB (L-AZB) detection, latitude-adaptive genuine-AZB (LG-AZB) detection and latitude-adaptive pseudo-AZB (LP-AZB) detection. Specifically, L-AZB refers to the AZB introduced by projection. LARD is used to detect L-AZB directly. LG-AZB refers to the AZB after hard-decision quantization and zeroing redundant coefficients. A novel latitude-adaptive sum of absolute difference estimation model is built to derive the threshold for LG-AZB detection. LP-AZB refers to the AZB in terms of rate-distortion optimization considering redundancy. A latitude-adaptive rate-distortion model is established for LP-AZB detection. Experimental results show that LRAS can achieve an average total encoding time reduction of 25.85% and 20.38% under low-delay and random access configurations compared to the original HEVC encoder, with only 0.16% and 0.13% BDBR increases and 0.01dB BDPSNR loss, respectively. The transform and quantization time savings are 60.13% and 59.94% on average.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6129-6142"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Blind Flare Removal Using Knowledge-Driven Flare-Level Estimator 利用知识驱动的耀斑级估计器实现盲目耀斑消除
Haoyou Deng;Lida Li;Feng Zhang;Zhiqiang Li;Bin Xu;Qingbo Lu;Changxin Gao;Nong Sang
{"title":"Toward Blind Flare Removal Using Knowledge-Driven Flare-Level Estimator","authors":"Haoyou Deng;Lida Li;Feng Zhang;Zhiqiang Li;Bin Xu;Qingbo Lu;Changxin Gao;Nong Sang","doi":"10.1109/TIP.2024.3480696","DOIUrl":"10.1109/TIP.2024.3480696","url":null,"abstract":"Lens flare is a common phenomenon when strong light rays arrive at the camera sensor and a clean scene is consequently mixed up with various opaque and semi-transparent artifacts. Existing deep learning methods are always constrained with limited real image pairs for training. Though recent synthesis-based approaches are found effective, synthesized pairs still deviate from the real ones as the mixing mechanism of flare artifacts and scenes in the wild always depends on a line of undetermined factors, such as lens structure, scratches, etc. In this paper, we present a new perspective from the blind nature of the flare removal task in a knowledge-driven manner. Specifically, we present a simple yet effective flare-level estimator to predict the corruption level of a flare-corrupted image. The estimated flare-level can be interpreted as additive information of the gap between corrupted images and their flare-free correspondences to facilitate a network at both training and testing stages adaptively. Besides, we utilize a flare-level modulator to better integrate the estimations into networks. We also devise a flare-aware block for more accurate flare recognition and reconstruction. Additionally, we collect a new real-world flare dataset for benchmarking, namely WiderFlare. Extensive experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art methods quantitatively and qualitatively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6114-6128"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning 非参数聚类引导的跨视图对比学习,用于部分视图对齐表征学习
Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu
{"title":"Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning","authors":"Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu","doi":"10.1109/TIP.2024.3480701","DOIUrl":"10.1109/TIP.2024.3480701","url":null,"abstract":"With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6158-6172"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming Image Super-Resolution: A ConvFormer-Based Efficient Approach 转换图像超分辨率:基于 ConvFormer 的高效方法
Gang Wu;Junjun Jiang;Junpeng Jiang;Xianming Liu
{"title":"Transforming Image Super-Resolution: A ConvFormer-Based Efficient Approach","authors":"Gang Wu;Junjun Jiang;Junpeng Jiang;Xianming Liu","doi":"10.1109/TIP.2024.3477350","DOIUrl":"10.1109/TIP.2024.3477350","url":null,"abstract":"Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. In particular, transformer-based methods, which leverage self-attention mechanisms, have led to significant breakthroughs but also introduce substantial computational costs. To tackle this issue, we introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR), offering an effective and efficient solution for lightweight image super-resolution. The proposed method inherits the advantages of both convolution-based and transformer-based approaches. Specifically, CFSR utilizes large kernel convolutions as a feature mixer to replace the self-attention module, efficiently modeling long-range dependencies and extensive receptive fields with minimal computational overhead. Furthermore, we propose an edge-preserving feed-forward network (EFN) designed to achieve local feature aggregation while effectively preserving high-frequency information. Extensive experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance compared to existing lightweight SR methods. When benchmarked against state-of-the-art methods such as ShuffleMixer, the proposed CFSR achieves a gain of 0.39 dB on the Urban100 dataset for the x2 super-resolution task while requiring 26% and 31% fewer parameters and FLOPs, respectively. The code and pre-trained models are available at \u0000<uri>https://github.com/Aitical/CFSR</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6071-6082"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142449611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信