IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
Mine yOur owN Anatomy: Revisiting Medical Image Segmentation With Extremely Limited Labels 挖掘自己的解剖图利用极其有限的标签重新审视医学图像分割
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-13 DOI: 10.1109/TPAMI.2024.3461321
Chenyu You;Weicheng Dai;Fenglin Liu;Yifei Min;Nicha C. Dvornek;Xiaoxiao Li;David A. Clifton;Lawrence Staib;James S. Duncan
{"title":"Mine yOur owN Anatomy: Revisiting Medical Image Segmentation With Extremely Limited Labels","authors":"Chenyu You;Weicheng Dai;Fenglin Liu;Yifei Min;Nicha C. Dvornek;Xiaoxiao Li;David A. Clifton;Lawrence Staib;James S. Duncan","doi":"10.1109/TPAMI.2024.3461321","DOIUrl":"10.1109/TPAMI.2024.3461321","url":null,"abstract":"Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised medical image segmentation framework termed Mine y\u0000<bold>O</b>\u0000ur ow\u0000<bold>N</b>\u0000 Anatomy (\u0000<sc>MONA</small>\u0000), and make three contributions. First, prior work argues that every pixel equally matters to the training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, we both empirically and theoretically, demonstrate the efficacy of our \u0000<sc>MONA</small>\u0000 on three benchmark datasets with different labeled settings, achieving new state-of-the-art under different labeled semi-supervised settings.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11136-11151"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning Vision-Language Models With Multiple Prototypes Clustering 利用多原型聚类调整视觉语言模型
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-13 DOI: 10.1109/TPAMI.2024.3460180
Meng-Hao Guo;Yi Zhang;Tai-Jiang Mu;Sharon X. Huang;Shi-Min Hu
{"title":"Tuning Vision-Language Models With Multiple Prototypes Clustering","authors":"Meng-Hao Guo;Yi Zhang;Tai-Jiang Mu;Sharon X. Huang;Shi-Min Hu","doi":"10.1109/TPAMI.2024.3460180","DOIUrl":"10.1109/TPAMI.2024.3460180","url":null,"abstract":"Benefiting from advances in large-scale pre-training, foundation models, have demonstrated remarkable capability in the fields of natural language processing, computer vision, among others. However, to achieve expert-level performance in specific applications, such models often need to be fine-tuned with domain-specific knowledge. In this paper, we focus on enabling vision-language models to unleash more potential for visual understanding tasks under few-shot tuning. Specifically, we propose a novel adapter, dubbed as lusterAdapter, which is based on trainable multiple prototypes clustering algorithm, for tuning the CLIP model. It can not only alleviate the concern of catastrophic forgetting of foundation models by introducing anchors to inherit common knowledge, but also improve the utilization efficiency of few annotated samples via bringing in clustering and domain priors, thereby improving the performance of few-shot tuning. We have conducted extensive experiments on 11 common classification benchmarks. The results show our method significantly surpasses the original CLIP and achieves state-of-the-art (SOTA) performance under all benchmarks and settings. For example, under the 16-shot setting, our method exhibits a remarkable improvement over the original CLIP by 19.6%, and also surpasses TIP-Adapter and GraphAdapter by 2.7% and 2.2%, respectively, in terms of average accuracy across the 11 benchmarks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11186-11199"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Dimensional Gradient Helps Out-of-Distribution Detection 低维梯度有助于分布外检测
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-12 DOI: 10.1109/TPAMI.2024.3459988
Yingwen Wu;Tao Li;Xinwen Cheng;Jie Yang;Xiaolin Huang
{"title":"Low-Dimensional Gradient Helps Out-of-Distribution Detection","authors":"Yingwen Wu;Tao Li;Xinwen Cheng;Jie Yang;Xiaolin Huang","doi":"10.1109/TPAMI.2024.3459988","DOIUrl":"10.1109/TPAMI.2024.3459988","url":null,"abstract":"Detecting out-of-distribution (OOD) samples is essential for ensuring the reliability of deep neural networks (DNNs) in real-world scenarios. While previous research has predominantly investigated the disparity between in-distribution (ID) and OOD data through forward information analysis, the discrepancy in parameter gradients during the backward process of DNNs has received insufficient attention. Existing studies on gradient disparities mainly focus on the utilization of gradient norms, neglecting the wealth of information embedded in gradient directions. To bridge this gap, in this paper, we conduct a comprehensive investigation into leveraging the entirety of gradient information for OOD detection. The primary challenge arises from the high dimensionality of gradients due to the large number of network parameters. To solve this problem, we propose performing linear dimension reduction on the gradient using a designated subspace that comprises principal components. This innovative technique enables us to obtain a low-dimensional representation of the gradient with minimal information loss. Subsequently, by integrating the reduced gradient with various existing detection score functions, our approach demonstrates superior performance across a wide range of detection tasks. For instance, on the ImageNet benchmark with ResNet50 model, our method achieves an average reduction of 11.15\u0000<inline-formula><tex-math>$%$</tex-math></inline-formula>\u0000 in the false positive rate at 95\u0000<inline-formula><tex-math>$%$</tex-math></inline-formula>\u0000 recall (FPR95) compared to the current state-of-the-art approach.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11378-11391"},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ES-GNN: Generalizing Graph Neural Networks Beyond Homophily With Edge Splitting ES-GNN:利用边缘分割实现超越同源性的图神经网络泛化
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-12 DOI: 10.1109/TPAMI.2024.3459932
Jingwei Guo;Kaizhu Huang;Rui Zhang;Xinping Yi
{"title":"ES-GNN: Generalizing Graph Neural Networks Beyond Homophily With Edge Splitting","authors":"Jingwei Guo;Kaizhu Huang;Rui Zhang;Xinping Yi","doi":"10.1109/TPAMI.2024.3459932","DOIUrl":"10.1109/TPAMI.2024.3459932","url":null,"abstract":"While Graph Neural Networks (GNNs) have achieved enormous success in multiple graph analytical tasks, modern variants mostly rely on the strong inductive bias of homophily. However, real-world networks typically exhibit both homophilic and heterophilic linking patterns, wherein adjacent nodes may share dissimilar attributes and distinct labels. Therefore, GNNs smoothing node proximity holistically may aggregate both task-relevant and irrelevant (even harmful) information, limiting their ability to generalize to heterophilic graphs and potentially causing non-robustness. In this work, we propose a novel Edge Splitting GNN (ES-GNN) framework to adaptively distinguish between graph edges either relevant or irrelevant to learning tasks. This essentially transfers the original graph into two subgraphs with the same node set but complementary edge sets dynamically. Given that, information propagation separately on these subgraphs and edge splitting are alternatively conducted, thus disentangling the task-relevant and irrelevant features. Theoretically, we show that our ES-GNN can be regarded as a solution to a \u0000<italic>disentangled graph denoising problem</i>\u0000, which further illustrates our motivations and interprets the improved generalization beyond homophily. Extensive experiments over 11 benchmark and 1 synthetic datasets not only demonstrate the effective performance of ES-GNN but also highlight its robustness to adversarial graphs and mitigation of the over-smoothing problem.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11345-11360"},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly-Supervised Depth Estimation and Image Deblurring via Dual-Pixel Sensors 通过双像素传感器进行弱监督深度估计和图像去毛刺
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-12 DOI: 10.1109/TPAMI.2024.3458974
Liyuan Pan;Richard Hartley;Liu Liu;Zhiwei Xu;Shah Chowdhury;Yan Yang;Hongguang Zhang;Hongdong Li;Miaomiao Liu
{"title":"Weakly-Supervised Depth Estimation and Image Deblurring via Dual-Pixel Sensors","authors":"Liyuan Pan;Richard Hartley;Liu Liu;Zhiwei Xu;Shah Chowdhury;Yan Yang;Hongguang Zhang;Hongdong Li;Miaomiao Liu","doi":"10.1109/TPAMI.2024.3458974","DOIUrl":"10.1109/TPAMI.2024.3458974","url":null,"abstract":"Dual-pixel (DP) imaging sensors are getting more popularly adopted by modern cameras. A DP camera captures a pair of images in a single snapshot by splitting each pixel in half. Several previous studies show how to recover depth information by treating the DP pair as an approximate stereo pair. However, dual-pixel disparity occurs only in image regions with defocus blur which is unlike classic stereo disparity. Heavy defocus blur in DP pairs affects the performance of depth estimation approaches based on matching. Therefore, we treat the blur removal and the depth estimation as a joint problem. We investigate the formation of the DP pair, which links the blur and depth information, rather than blindly removing the blur effect. We propose a mathematical DP model that can improve depth estimation by the blur. This exploration motivated us to propose our previous work, an end-to-end DDDNet (DP-based Depth and Deblur Network), which jointly estimates depth and restores the image in a supervised fashion. However, collecting the ground-truth (GT) depth map for the DP pair is challenging and limits the depth estimation potential of the DP sensor. Therefore, we propose an extension of the DDDNet, called WDDNet (Weakly-supervised Depth and Deblur Network), which includes an efficient reblur solver that does not require GT depth maps for training. To achieve this, we convert all-in-focus images into supervisory signals for unsupervised depth estimation in our WDDNet. We jointly estimate an all-in-focus image and a disparity map, then use a \u0000<italic>Reblur</i>\u0000 and \u0000<italic>Fstack</i>\u0000 module to regularize the disparity estimation and image restoration. We conducted extensive experiments on synthetic and real data to demonstrate the competitive performance of our method when compared to state-of-the-art (SOTA) supervised approaches.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11314-11330"},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias 针对学习偏差的大规模归属图节点表征学习的标签解卷积
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-12 DOI: 10.1109/TPAMI.2024.3459408
Zhihao Shi;Jie Wang;Fanghua Lu;Hanzhu Chen;Defu Lian;Zheng Wang;Jieping Ye;Feng Wu
{"title":"Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias","authors":"Zhihao Shi;Jie Wang;Fanghua Lu;Hanzhu Chen;Defu Lian;Zheng Wang;Jieping Ye;Feng Wu","doi":"10.1109/TPAMI.2024.3459408","DOIUrl":"10.1109/TPAMI.2024.3459408","url":null,"abstract":"Node representation learning on attributed graphs—whose nodes are associated with rich attributes (e.g., texts and protein sequences)—plays a crucial role in many important downstream tasks. To encode the attributes and graph structures simultaneously, recent studies integrate pre-trained models with graph neural networks (GNNs), where pre-trained models serve as node encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs on large-scale graphs suffers from severe scalability issues, many methods propose to train NEs and GNNs separately. Consequently, they do not take feature convolutions in GNNs into consideration in the training phase of NEs, leading to a significant learning bias relative to the joint training. To address this challenge, we propose an efficient label regularization technique, namely \u0000<bold>L</b>\u0000abel \u0000<bold>D</b>\u0000econvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. The inverse mapping leads to an objective function that is equivalent to that by the joint training, while it can effectively incorporate GNNs in the training phase of NEs against the learning bias. More importantly, we show that LD converges to the optimal objective function values by the joint training under mild assumptions. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph Benchmark datasets.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11273-11286"},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning From Human Attention for Attribute-Assisted Visual Recognition 学习人类注意力,实现属性辅助视觉识别
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-11 DOI: 10.1109/TPAMI.2024.3458921
Xiao Bai;Pengcheng Zhang;Xiaohan Yu;Jin Zheng;Edwin R. Hancock;Jun Zhou;Lin Gu
{"title":"Learning From Human Attention for Attribute-Assisted Visual Recognition","authors":"Xiao Bai;Pengcheng Zhang;Xiaohan Yu;Jin Zheng;Edwin R. Hancock;Jun Zhou;Lin Gu","doi":"10.1109/TPAMI.2024.3458921","DOIUrl":"10.1109/TPAMI.2024.3458921","url":null,"abstract":"With prior knowledge of seen objects, humans have a remarkable ability to recognize novel objects using shared and distinct local attributes. This is significant for the challenging tasks of zero-shot learning (ZSL) and fine-grained visual classification (FGVC), where the discriminative attributes of objects have played an important role. Inspired by human visual attention, neural networks have widely exploited the attention mechanism to learn the locally discriminative attributes for challenging tasks. Though greatly promoted the development of these fields, existing works mainly focus on learning the region embeddings of different attribute features and neglect the importance of discriminative attribute localization. It is also unclear whether the learned attention truly matches the real human attention. To tackle this problem, this paper proposes to employ real human gaze data for visual recognition networks to learn from human attention. Specifically, we design a unified Attribute Attention Network (A\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000Net) that learns from human attention for both ZSL and FGVC tasks. The overall model consists of an attribute attention branch and a baseline classification network. On top of the image feature maps provided by the baseline classification network, the attribute attention branch employs attribute prototypes to produce attribute attention maps and attribute features. The attribute attention maps are converted to gaze-like attentions to be aligned with real human gaze attention. To guarantee the effectiveness of attribute feature learning, we further align the extracted attribute features with attribute-defined class embeddings. To facilitate learning from human gaze attention for the visual recognition problems, we design a bird classification game to collect real human gaze data using the CUB dataset via an eye-tracker device. Experiments on ZSL and FGVC tasks without/with real human gaze data validate the benefits and accuracy of our proposed model. This work supports the promising benefits of collecting human gaze datasets and automatic gaze estimation algorithms learning from human attention for high-level computer vision tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11152-11167"},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142171000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of Safe Reinforcement Learning: Methods, Theories, and Applications 安全强化学习回顾:方法、理论和应用
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-10 DOI: 10.1109/TPAMI.2024.3457538
Shangding Gu;Long Yang;Yali Du;Guang Chen;Florian Walter;Jun Wang;Alois Knoll
{"title":"A Review of Safe Reinforcement Learning: Methods, Theories, and Applications","authors":"Shangding Gu;Long Yang;Yali Du;Guang Chen;Florian Walter;Jun Wang;Alois Knoll","doi":"10.1109/TPAMI.2024.3457538","DOIUrl":"10.1109/TPAMI.2024.3457538","url":null,"abstract":"Reinforcement Learning (RL) has achieved tremendous success in many complex decision-making tasks. However, safety concerns are raised during deploying RL in real-world applications, leading to a growing demand for safe RL algorithms, such as in autonomous driving and robotics scenarios. While safe control has a long history, the study of safe RL algorithms is still in the early stages. To establish a good foundation for future safe RL research, in this paper, we provide a review of safe RL from the perspectives of methods, theories, and applications. First, we review the progress of safe RL from five dimensions and come up with five crucial problems for safe RL being deployed in real-world applications, coined as \u0000<italic>“2H3W”</i>\u0000. Second, we analyze the algorithm and theory progress from the perspectives of answering the \u0000<italic>“2H3W”</i>\u0000 problems. Particularly, the sample complexity of safe RL algorithms is reviewed and discussed, followed by an introduction to the applications and benchmarks of safe RL algorithms. Finally, we open the discussion of the challenging problems in safe RL, hoping to inspire future research on this thread. To advance the study of safe RL algorithms, we release an open-sourced repository containing major safe RL algorithms at the link.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11216-11235"},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142166413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Refining 3D Human Texture Estimation From a Single Image 从单张图像完善三维人体纹理估算
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-10 DOI: 10.1109/TPAMI.2024.3456817
Said Fahri Altindis;Adil Meric;Yusuf Dalva;Uğur Güdükbay;Aysegul Dundar
{"title":"Refining 3D Human Texture Estimation From a Single Image","authors":"Said Fahri Altindis;Adil Meric;Yusuf Dalva;Uğur Güdükbay;Aysegul Dundar","doi":"10.1109/TPAMI.2024.3456817","DOIUrl":"10.1109/TPAMI.2024.3456817","url":null,"abstract":"Estimating 3D human texture from a single image is essential in graphics and vision. It requires learning a mapping function from input images of humans with diverse poses into the parametric (\u0000<italic>uv</i>\u0000) space and reasonably hallucinating invisible parts. To achieve a high-quality 3D human texture estimation, we propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network. Additionally, we describe a novel cycle consistency loss that improves view generalization. We further propose to train our framework with an uncertainty-based pixel-level image reconstruction loss, which enhances color fidelity. We compare our method against the state-of-the-art approaches and show significant qualitative and quantitative improvements.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11464-11475"},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142166414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Single Image Defocus Deblurring via Gaussian Kernel Mixture Learning 通过高斯核混杂学习实现深度单图像去焦模糊
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-10 DOI: 10.1109/TPAMI.2024.3457856
Yuhui Quan;Zicong Wu;Ruotao Xu;Hui Ji
{"title":"Deep Single Image Defocus Deblurring via Gaussian Kernel Mixture Learning","authors":"Yuhui Quan;Zicong Wu;Ruotao Xu;Hui Ji","doi":"10.1109/TPAMI.2024.3457856","DOIUrl":"10.1109/TPAMI.2024.3457856","url":null,"abstract":"This paper proposes an end-to-end deep learning approach for removing defocus blur from a single defocused image. Defocus blur is a common issue in digital photography that poses a challenge due to its spatially-varying and large blurring effect. The proposed approach addresses this challenge by employing a pixel-wise Gaussian kernel mixture (GKM) model to accurately yet compactly parameterize spatially-varying defocus point spread functions (PSFs), which is motivated by the isotropy in defocus PSFs. We further propose a grouped GKM (GGKM) model that decouples the coefficients in GKM, so as to improve the modeling accuracy with an economic manner. Afterward, a deep neural network called GGKMNet is then developed by unrolling a fixed-point iteration process of GGKM-based image deblurring, which avoids the efficiency issues in existing unrolling DNNs. Using a lightweight scale-recurrent architecture with a coarse-to-fine estimation scheme to predict the coefficients in GGKM, the GGKMNet can efficiently recover an all-in-focus image from a defocused one. Such advantages are demonstrated with extensive experiments on five benchmark datasets, where the GGKMNet outperforms existing defocus deblurring methods in restoration quality, as well as showing advantages in terms of model complexity and computational efficiency.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11361-11377"},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142166409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信