Computer Vision and Image Understanding最新文献

筛选
英文 中文
When super-resolution meets camouflaged object detection: A comparison study 超分辨率与伪装目标检测的比较研究
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-21 DOI: 10.1016/j.cviu.2025.104321
Juan Wen , Shupeng Cheng , Weiyan Hou , Luc Van Gool , Radu Timofte
{"title":"When super-resolution meets camouflaged object detection: A comparison study","authors":"Juan Wen ,&nbsp;Shupeng Cheng ,&nbsp;Weiyan Hou ,&nbsp;Luc Van Gool ,&nbsp;Radu Timofte","doi":"10.1016/j.cviu.2025.104321","DOIUrl":"10.1016/j.cviu.2025.104321","url":null,"abstract":"<div><div>Super-resolution (SR) and camouflage object detection (COD) are two prominent topics in the field of computer vision, with various joint applications. However, in previous work, these two areas were often studied in isolation. In this paper, we conduct a comprehensive comparative evaluation of both for the first time. Specifically, we benchmark different super-resolution methods on commonly used COD datasets while also evaluating the robustness of different COD models using COD data processed by SR methods. Experiments reveal challenges in preserving semantic information due to differences in targets and features between the two domains. COD relies on extracting semantic information from low-resolution images to identify camouflage targets. There is a risk of losing or distorting important semantic details during the application of SR techniques. Balancing the enhancement of spatial resolution with the preservation of semantic information is crucial for maintaining the accuracy of COD algorithms. Therefore, we propose a new SR model called Dilated Super-resolution (DISR) to enhance SR performance on COD, achieving state-of-the-art results on five commonly used SR datasets. The Urban100 x4 dataset task improved by 0.38 dB. Using low-resolution images processed by DISR for COD tasks can enhance target visibility and significantly improve the performance of COD tasks. Our goal is to leverage the synergies between these two domains, draw insights from the complementarity of techniques in both fields, and provide insights and inspiration for future research in both communities.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104321"},"PeriodicalIF":4.3,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MultiFire20K: A semi-supervised enhanced large-scale UAV-based benchmark for advancing multi-task learning in fire monitoring MultiFire20K:一个半监督的增强大型无人机基准,用于推进火灾监测中的多任务学习
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-19 DOI: 10.1016/j.cviu.2025.104318
Demetris Shianios, Panayiotis Kolios, Christos Kyrkou
{"title":"MultiFire20K: A semi-supervised enhanced large-scale UAV-based benchmark for advancing multi-task learning in fire monitoring","authors":"Demetris Shianios,&nbsp;Panayiotis Kolios,&nbsp;Christos Kyrkou","doi":"10.1016/j.cviu.2025.104318","DOIUrl":"10.1016/j.cviu.2025.104318","url":null,"abstract":"<div><div>Effective fire detection and response are crucial to minimizing the widespread damage and loss caused by fires in both urban and natural environments. While advancements in Computer Vision have enhanced fire detection and response, progress in UAV-based monitoring remains limited due to the lack of comprehensive datasets. This study introduces the <em>MultiFire20K</em> dataset, comprising 20,500 diverse aerial fire images with annotations for fire classification, environment classification, and separate segmentation masks for both fire and smoke, specifically designed to support multi-task learning. Due to limited labeled data in remote sensing, a semi-supervised approach for generating pseudo-labels for fire and smoke masks is explored which takes into consideration the environment of the event. We experimented with various segmentation architectures backbone models to generate reliable pseudo-label masks. Benchmarks were established by evaluating models on fire classification, environment classification, and the segmentation of both fire and smoke, and comparing these results to those obtained from multi-task models. Our study highlights the substantial advantages of a multi-task approach in fire monitoring, particularly in improving fire and smoke segmentation through shared knowledge during training. This enhanced efficiency, combined with the conservation of memory and computational resources, makes the multi-task framework superior for real-time applications, especially when compared to using separate models for each individual task. We anticipate that our dataset and benchmark results will encourage further research in fire surveillance, advancing fire detection and prevention methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"254 ","pages":"Article 104318"},"PeriodicalIF":4.3,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental few-shot instance segmentation via feature enhancement and prototype calibration 基于特征增强和原型校准的增量少镜头实例分割
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-12 DOI: 10.1016/j.cviu.2025.104317
Weixiang Gao , Caijuan Shi , Rui Wang , Ao Cai , Changyu Duan , Meiqin Liu
{"title":"Incremental few-shot instance segmentation via feature enhancement and prototype calibration","authors":"Weixiang Gao ,&nbsp;Caijuan Shi ,&nbsp;Rui Wang ,&nbsp;Ao Cai ,&nbsp;Changyu Duan ,&nbsp;Meiqin Liu","doi":"10.1016/j.cviu.2025.104317","DOIUrl":"10.1016/j.cviu.2025.104317","url":null,"abstract":"<div><div>Incremental few-shot instance segmentation (iFSIS) aims to detect and segment instances of novel classes with only a few training samples, while maintaining performance on base classes without revisiting base class data. iMTFA, a representative iFSIS method, offers a flexible approach for adding novel classes. Its key mechanism involves generating novel class weights by normalizing and averaging embeddings obtained from <span><math><mi>K</mi></math></span>-shot novel instances. However, relying on such a small sample size often leads to insufficient representation of the real class distribution, which in turn results in biased weights for the novel classes. Furthermore, due to the absence of novel fine-tuning, iMTFA tends to predict potential novel class foregrounds as background, which exacerbates the bias in the generated novel class weights. To overcome these limitations, we propose a simple but effective iFSIS method, named Enhancement and Calibration-based iMTFA (EC-iMTFA). Specifically, we first design an embedding enhancement and aggregation (EEA) module, which enhances the feature diversity of each novel instance embedding before generating novel class weights. We then design a novel prototype calibration (NPC) module that leverages the well-calibrated base class and background weights in the classifier to enhance the discriminability of novel class prototypes. In addition, a simple weight preprocessing (WP) mechanism is designed based on NPC to improve the calibration process further. Extensive experiments on COCO and VOC datasets demonstrate that EC-iMTFA outperforms iMTFA in terms of iFSIS and iFSOD performance, stability, and efficiency without requiring novel fine-tuning. Moreover, EC-iMTFA achieves competitive results compared to recent state-of-the-art methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104317"},"PeriodicalIF":4.3,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cartoon character recognition based on portrait style fusion 基于肖像风格融合的卡通人物识别
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-10 DOI: 10.1016/j.cviu.2025.104316
De Li , Zhenyi Jin , Xun Jin
{"title":"Cartoon character recognition based on portrait style fusion","authors":"De Li ,&nbsp;Zhenyi Jin ,&nbsp;Xun Jin","doi":"10.1016/j.cviu.2025.104316","DOIUrl":"10.1016/j.cviu.2025.104316","url":null,"abstract":"<div><div>In this paper, we propose a cartoon character recognition method using portrait characteristics to address the problem of copyright protection in cartoon works. The proposed recognition framework is derived from content-based retrieval mechanism, achieving an effective solution for copyright identification of cartoon characters. This research has two core contributions. The first is that we propose an ECA-based residual attention module to improve cartoon character feature learning ability. Cartoon character images typically have fewer details and texture information, and inter-channel information interaction can more effectively extract cartoon features. The second is a style transfer-based cartoon character construction mechanism, which is proposed to create a simulated plagiarized cartoon character dataset by fusing portrait style and content. Comparative experiments demonstrate that the proposed model effectively improves detection accuracy. Finally, we validate the effectiveness and feasibility of the model by retrieving plagiarized versions of cartoon characters.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104316"},"PeriodicalIF":4.3,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-modal explainability approach for human-aware robots in multi-party conversation 多方对话中人类感知机器人的多模态可解释性方法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-04 DOI: 10.1016/j.cviu.2025.104304
Iveta Bečková , Štefan Pócoš , Giulia Belgiovine , Marco Matarese , Omar Eldardeer , Alessandra Sciutti , Carlo Mazzola
{"title":"A multi-modal explainability approach for human-aware robots in multi-party conversation","authors":"Iveta Bečková ,&nbsp;Štefan Pócoš ,&nbsp;Giulia Belgiovine ,&nbsp;Marco Matarese ,&nbsp;Omar Eldardeer ,&nbsp;Alessandra Sciutti ,&nbsp;Carlo Mazzola","doi":"10.1016/j.cviu.2025.104304","DOIUrl":"10.1016/j.cviu.2025.104304","url":null,"abstract":"<div><div>The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition in multi-party conversation scenarios. Specifically, in the field of human–robot interaction, it becomes even more crucial to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task, restricting the robot’s capability to estimate whether it was addressed or not, which limits its interactive skills. For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability. Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models, to provide explanations for their decisions besides excellent performance. In our work, we (a) present an addressee estimation model with improved performance in comparison with the previous state-of-the-art; (b) further modify this model to include inherently explainable attention-based segments; (c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot; (d) validate the real-time performance of the explainable model in multi-party human–robot interaction; (e) propose several ways to incorporate explainability and transparency in the aforementioned architecture; and (f) perform an online user study to analyze the effect of various explanations on how human participants perceive the robot.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104304"},"PeriodicalIF":4.3,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring plain ViT features for multi-class unsupervised visual anomaly detection 探索用于多类无监督视觉异常检测的普通ViT特征
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-04 DOI: 10.1016/j.cviu.2025.104308
Jiangning Zhang , Xuhai Chen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li , Ming-Hsuan Yang , Dacheng Tao
{"title":"Exploring plain ViT features for multi-class unsupervised visual anomaly detection","authors":"Jiangning Zhang ,&nbsp;Xuhai Chen ,&nbsp;Yabiao Wang ,&nbsp;Chengjie Wang ,&nbsp;Yong Liu ,&nbsp;Xiangtai Li ,&nbsp;Ming-Hsuan Yang ,&nbsp;Dacheng Tao","doi":"10.1016/j.cviu.2025.104308","DOIUrl":"10.1016/j.cviu.2025.104308","url":null,"abstract":"<div><div>This work studies a challenging and practical issue known as multi-class unsupervised anomaly detection (MUAD). This problem requires only normal images for training while simultaneously testing both normal and anomaly images across multiple classes. Existing reconstruction-based methods typically adopt pyramidal networks as encoders and decoders to obtain multi-resolution features, often involving complex sub-modules with extensive handcraft engineering. In contrast, a plain Vision Transformer (ViT) showcasing a more straightforward architecture has proven effective in multiple domains, including detection and segmentation tasks. It is simpler, more effective, and elegant. Following this spirit, we explore the use of only plain ViT features for MUAD. We first abstract a Meta-AD concept by synthesizing current reconstruction-based methods. Subsequently, we instantiate a novel ViT-based ViTAD structure, designed incrementally from both global and local perspectives. This model provide a strong baseline to facilitate future research. Additionally, this paper uncovers several intriguing findings for further investigation. Finally, we comprehensively and fairly benchmark various approaches using seven metrics and their average. Utilizing a basic training regimen with only an MSE loss, ViTAD achieves state-of-the-art results and efficiency on MVTec AD, VisA, and Uni-Medical datasets. <em>E.g</em>., achieving 85.4 mAD that surpasses UniAD by +3.0 for the MVTec AD dataset, and it requires only 1.1 h and 2.3G GPU memory to complete model training on a single V100 that can serve as a strong baseline to facilitate the development of future research. Full code is available at <span><span>https://zhangzjn.github.io/projects/ViTAD/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104308"},"PeriodicalIF":4.3,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143419470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monocular per-object distance estimation with Masked Object Modeling 用蒙面对象建模估计单目单目标距离
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-03 DOI: 10.1016/j.cviu.2025.104303
Aniello Panariello, Gianluca Mancusi, Fedy Haj Ali, Angelo Porrello, Simone Calderara, Rita Cucchiara
{"title":"Monocular per-object distance estimation with Masked Object Modeling","authors":"Aniello Panariello,&nbsp;Gianluca Mancusi,&nbsp;Fedy Haj Ali,&nbsp;Angelo Porrello,&nbsp;Simone Calderara,&nbsp;Rita Cucchiara","doi":"10.1016/j.cviu.2025.104303","DOIUrl":"10.1016/j.cviu.2025.104303","url":null,"abstract":"<div><div>Per-object distance estimation is critical in surveillance and autonomous driving, where safety is crucial. While existing methods rely on geometric or deep supervised features, only a few attempts have been made to leverage self-supervised learning. In this respect, our paper draws inspiration from Masked Image Modeling (MiM) and extends it to <strong>multi-object tasks</strong>. While MiM focuses on extracting global image-level representations, it struggles with individual objects within the image. This is detrimental for distance estimation, as objects far away correspond to negligible portions of the image. Conversely, our strategy, termed <strong>Masked Object Modeling</strong> (<strong>MoM</strong>), enables a novel application of masking techniques. In a few words, we devise an auxiliary objective that reconstructs the portions of the image pertaining to the objects detected in the scene. The training phase is performed in a single unified stage, simultaneously optimizing the masking objective and the downstream loss (<em>i.e</em>., distance estimation).</div><div>We evaluate the effectiveness of MoM on a novel reference architecture (DistFormer) on the standard KITTI, NuScenes, and MOTSynth datasets. Our evaluation reveals that our framework surpasses the SoTA and highlights its robust regularization properties. The MoM strategy enhances both zero-shot and few-shot capabilities, from synthetic to real domain. Finally, it furthers the robustness of the model in the presence of occluded or poorly detected objects.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104303"},"PeriodicalIF":4.3,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143136340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fake News Detection Based on BERT Multi-domain and Multi-modal Fusion Network 基于BERT多域多模态融合网络的假新闻检测
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104301
Kai Yu , Shiming Jiao , Zhilong Ma
{"title":"Fake News Detection Based on BERT Multi-domain and Multi-modal Fusion Network","authors":"Kai Yu ,&nbsp;Shiming Jiao ,&nbsp;Zhilong Ma","doi":"10.1016/j.cviu.2025.104301","DOIUrl":"10.1016/j.cviu.2025.104301","url":null,"abstract":"<div><div>The pervasive growth of the Internet has simplified communication, making the detection and annotation of fake news on social media increasingly critical. Leveraging existing studies, this work introduces the Fake News Detection Based on BERT Multi-domain and Multi-modal Fusion Network (BMMFN). This framework utilizes the BERT model to transform text content of fake news into textual vectors, while image features are extracted using the VGG-19 model. A multimodal fusion network is developed, factoring in text-image correlations and interactions through joint matrices that enhance the integration of information across modalities. Additionally, a multidomain classifier is incorporated to align multimodal features from various events within a unified feature space. The performance of this model is confirmed through experiments on Weibo and Twitter datasets, with results indicating that the BMMFN model surpasses contemporary state-of-the-art models in several metrics, thereby effectively enhancing the detection of fake news.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104301"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local optimization cropping and boundary enhancement for end-to-end weakly-supervised segmentation network 端到端弱监督分割网络的局部优化裁剪与边界增强
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104260
Weizheng Wang, Chao Zeng, Haonan Wang, Lei Zhou
{"title":"Local optimization cropping and boundary enhancement for end-to-end weakly-supervised segmentation network","authors":"Weizheng Wang,&nbsp;Chao Zeng,&nbsp;Haonan Wang,&nbsp;Lei Zhou","doi":"10.1016/j.cviu.2024.104260","DOIUrl":"10.1016/j.cviu.2024.104260","url":null,"abstract":"<div><div>In recent years, the performance of weakly-supervised semantic segmentation(WSSS) has significantly increased. It usually employs image-level labels to generate Class Activation Map (CAM) for producing pseudo-labels, which greatly reduces the cost of annotation. Since CNN cannot fully identify object regions, researchers found that Vision Transformers (ViT) can complement the deficiencies of CNN by better extracting global contextual information. However, ViT also introduces the problem of over-smoothing. Great progress has been made in recent years to solve the over-smoothing problem, yet two issues remain. The first issue is that the high-confidence regions in the network-generated CAM still contain areas irrelevant to the class. The second issue is the inaccuracy of CAM boundaries, which contain a small portion of background regions. As we know, the precision of label boundaries is closely tied to excellent segmentation performance. In this work, to address the first issue, we propose a local optimized cropping module (LOC). By randomly cropping selected regions, we allow the local class tokens to be contrasted with the global class tokens. This method facilitates enhanced consistency between local and global representations. To address the second issue, we design a boundary enhancement module (BE) that utilizes an erasing strategy to re-train the image, increasing the network’s extraction of boundary information and greatly improving the accuracy of CAM boundaries, thereby enhancing the quality of pseudo labels. Experiments on the PASCAL VOC dataset show that the performance of our proposed LOC-BE Net outperforms multi-stage methods and is competitive with end-to-end methods. On the PASCAL VOC dataset, our method achieves a CAM mIoU of 74.2% and a segmentation mIoU of 73.1%. On the COCO2014 dataset, our method achieves a CAM mIoU of 43.8% and a segmentation mIoU of 43.4%. Our code has been open sourced: <span><span>https://github.com/whn786/LOC-BE/tree/main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104260"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guided image filtering-conventional to deep models: A review and evaluation study 引导图像滤波-常规深度模型:综述与评价研究
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104278
Weimin Yuan, Yinuo Wang, Cai Meng, Xiangzhi Bai
{"title":"Guided image filtering-conventional to deep models: A review and evaluation study","authors":"Weimin Yuan,&nbsp;Yinuo Wang,&nbsp;Cai Meng,&nbsp;Xiangzhi Bai","doi":"10.1016/j.cviu.2025.104278","DOIUrl":"10.1016/j.cviu.2025.104278","url":null,"abstract":"<div><div>In the past decade, guided image filtering (GIF) has emerged as a successful edge-preserving smoothing technique designed to remove noise while retaining important edges and structures in images. By leveraging a well-aligned guidance image as the prior, GIF has become a valuable tool in various visual applications, offering a balance between edge preservation and computational efficiency. Despite the significant advancements and the development of numerous GIF variants, there has been limited effort to systematically review and evaluate the diverse methods within this research community. To address this gap, this paper offers a comprehensive survey of existing GIF variants, covering both conventional and deep learning-based models. Specifically, we begin by introducing the basic formulation of GIF and its fast implementations. Next, we categorize the GIF follow-up methods into three main categories: local methods, global methods and deep learning-based methods. Within each category, we provide a new sub-taxonomy to better illustrate the motivations behind their design, as well as their contributions and limitations. We then conduct experiments to compare the performance of representative methods, with an analysis of qualitative and quantitative results that reveals several insights into the current state of this research area. Finally, we discuss unresolved issues in the field of GIF and highlight some open problems for further research.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104278"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143097182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信