Computer Vision and Image Understanding最新文献

筛选
英文 中文
Cleanness-navigated-contamination network: A unified framework for recovering regional degradation 清洁导航污染网络:恢复区域退化的统一框架
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104274
Qianhao Yu, Naishan Zheng, Jie Huang, Feng Zhao
{"title":"Cleanness-navigated-contamination network: A unified framework for recovering regional degradation","authors":"Qianhao Yu,&nbsp;Naishan Zheng,&nbsp;Jie Huang,&nbsp;Feng Zhao","doi":"10.1016/j.cviu.2024.104274","DOIUrl":"10.1016/j.cviu.2024.104274","url":null,"abstract":"<div><div>Image restoration from regional degradation has long been an important and challenging task. The key to contamination removal is recovering the contents of the corrupted regions with the guidance of the non-corrupted regions. Due to the inadequate long-range modeling, the CNN-based approaches cannot thoroughly investigate the information from non-corrupted regions, resulting in distorted visuals with artificial traces between different regions. To address this issue, we propose a novel Cleanness-Navigated-Contamination Network (CNCNet), which is a unified framework for recovering regional image contamination, such as shadow, flare, and other regional degradation. Our method mainly consists of two components: a contamination-oriented adaptive normalization (COAN) module and a contamination-aware aggregation with transformer (CAAT) module based on the contamination region mask. Under the guidance of the contamination mask, the COAN module formulates the statistics from the non-corrupted region and adaptively applies them to the corrupted region for region-wise restoration. The CAAT module utilizes the region mask to precisely guide the restoration of each contaminated pixel by considering the highly relevant pixels from the contamination-free regions for global pixel-wise restoration. Extensive experiments in both shadow removal tasks and flare removal tasks show that our network framework achieves superior restoration performance.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104274"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full-body virtual try-on using top and bottom garments with wearing style control 上身和下身虚拟试穿,穿着风格控制
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104259
Soonchan Park , Jinah Park
{"title":"Full-body virtual try-on using top and bottom garments with wearing style control","authors":"Soonchan Park ,&nbsp;Jinah Park","doi":"10.1016/j.cviu.2024.104259","DOIUrl":"10.1016/j.cviu.2024.104259","url":null,"abstract":"<div><div>Various studies have been proposed to synthesize realistic images for image-based virtual try-on, but most of them are limited to replacing a single item on a given model, without considering wearing styles. In this paper, we address the novel problem of <em>full-body</em> virtual try-on with <em>multiple</em> garments by introducing a new benchmark dataset and an image synthesis method. Our Fashion-TB dataset provides comprehensive clothing information by mapping fashion models to their corresponding top and bottom garments, along with semantic region annotations to represent the structure of the garments. WGF-VITON, the single-stage network we have developed, generates full-body try-on images using top and bottom garments simultaneously. Instead of relying on preceding networks to estimate intermediate knowledge, modules for garment transformation and image synthesis are integrated and trained through end-to-end learning. Furthermore, our method proposes Wearing-guide scheme to control the wearing styles in the synthesized try-on images. Through various experiments, for the full-body virtual try-on task, WGF-VITON outperforms state-of-the-art networks in both quantitative and qualitative evaluations with an optimized number of parameters while allowing users to control the wearing styles of the output images. The code and data are available at <span><span>https://github.com/soonchanpark/WGF-VITON</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104259"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DM-Align: Leveraging the power of natural language instructions to make changes to images DM-Align:利用自然语言指令的力量对图像进行更改
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104292
Maria-Mihaela Trusca , Tinne Tuytelaars , Marie-Francine Moens
{"title":"DM-Align: Leveraging the power of natural language instructions to make changes to images","authors":"Maria-Mihaela Trusca ,&nbsp;Tinne Tuytelaars ,&nbsp;Marie-Francine Moens","doi":"10.1016/j.cviu.2025.104292","DOIUrl":"10.1016/j.cviu.2025.104292","url":null,"abstract":"<div><div>Text-based semantic image editing assumes the manipulation of an image using a natural language instruction. Although recent works are capable of generating creative and qualitative images, the problem is still mostly approached as a black box sensitive to generating unexpected outputs. Therefore, we propose a novel model to enhance the text-based control of an image editor by explicitly reasoning about which parts of the image to alter or preserve. It relies on word alignments between a description of the original source image and the instruction that reflects the needed updates, and the input image. The proposed Diffusion Masking with word Alignments (DM-Align) allows the editing of an image in a transparent and explainable way. It is evaluated on a subset of the Bison dataset and a self-defined dataset dubbed Dream. When comparing to state-of-the-art baselines, quantitative and qualitative results show that DM-Align has superior performance in image editing conditioned on language instructions, well preserves the background of the image and can better cope with long text instructions.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104292"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rebalanced supervised contrastive learning with prototypes for long-tailed visual recognition 基于原型的长尾视觉识别再平衡监督对比学习
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104291
Xuhui Chang, Junhai Zhai, Shaoxin Qiu, Zhengrong Sun
{"title":"Rebalanced supervised contrastive learning with prototypes for long-tailed visual recognition","authors":"Xuhui Chang,&nbsp;Junhai Zhai,&nbsp;Shaoxin Qiu,&nbsp;Zhengrong Sun","doi":"10.1016/j.cviu.2025.104291","DOIUrl":"10.1016/j.cviu.2025.104291","url":null,"abstract":"<div><div>In the real world, data often follows a long-tailed distribution, resulting in head classes receiving more attention while tail classes are frequently overlooked. Although supervised contrastive learning (SCL) performs well on balanced datasets, it struggles to distinguish features between tail classes in the latent space when dealing with long-tailed data. To address this issue, we propose Rebalanced Supervised Contrastive Learning (ReCL), which can effectively enhance the separability of tail classes features. Compared with two state-of-the-art methods, Contrastive Learning based hybrid networks (Hybrid-SC) and Targeted Supervised Contrastive Learning (TSC), ReCL has two distinctive characteristics: (1) ReCL enhances the clarity of classification boundaries between tail classes by encouraging samples to align more closely with their corresponding prototypes. (2) ReCL does not require targets generation, thereby conserving computational resources. Our method significantly improves the recognition of tail classes, demonstrating competitive accuracy across multiple long-tailed datasets. Our code has been uploaded to <span><span>https://github.com/cxh981110/ReCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104291"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based Dense Event Grounding with relative positional encoding 基于相对位置编码的基于图的密集事件接地
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104257
Jianxiang Dong, Zhaozheng Yin
{"title":"Graph-based Dense Event Grounding with relative positional encoding","authors":"Jianxiang Dong,&nbsp;Zhaozheng Yin","doi":"10.1016/j.cviu.2024.104257","DOIUrl":"10.1016/j.cviu.2024.104257","url":null,"abstract":"<div><div>Temporal Sentence Grounding (TSG) in videos aims to localize a temporal moment from an untrimmed video that is relevant to a given query sentence. Most existing methods focus on addressing the problem of single sentence grounding. Recently, researchers proposed a new Dense Event Grounding (DEG) problem by extending the single event localization to a multi-event localization, where the temporal moments of multiple events described by multiple sentences are retrieved. In this paper, we introduce an effective proposal-based approach to solve the DEG problem. A Relative Sentence Interaction (RSI) module using graph neural network is proposed to model the event relationship by introducing a temporal relative positional encoding to learn the relative temporal order information between sentences in a dense multi-sentence query. In addition, we design an Event-contextualized Cross-modal Interaction (ECI) module to tackle the lack of global information from other related events when fusing visual and sentence features. Finally, we construct an Event Graph (EG) with intra-event edges and inter-event edges to model the relationship between proposals in the same event and proposals in different events to further refine their representations for final localizations. Extensive experiments on ActivityNet-Captions and TACoS datasets show the effectiveness of our solution.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104257"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pruning networks at once via nuclear norm-based regularization and bi-level optimization 通过核规范正则化和双级优化,一次修剪网络
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104247
Donghyeon Lee , Eunho Lee , Jaehyuk Kang, Youngbae Hwang
{"title":"Pruning networks at once via nuclear norm-based regularization and bi-level optimization","authors":"Donghyeon Lee ,&nbsp;Eunho Lee ,&nbsp;Jaehyuk Kang,&nbsp;Youngbae Hwang","doi":"10.1016/j.cviu.2024.104247","DOIUrl":"10.1016/j.cviu.2024.104247","url":null,"abstract":"<div><div>Most network pruning methods focus on identifying redundant channels from pre-trained models, which is inefficient due to its three-step process: pre-training, pruning and fine-tuning, and reconfiguration. In this paper, we propose a pruning-from-scratch framework that unifies these processes into a single approach. We introduce nuclear norm-based regularization to maintain the representational capacity of large networks during pruning. Combining this with MACs-based regularization enhances the performance of the pruned network at the target compression rate. Our bi-level optimization approach simultaneously improves pruning efficiency and representation capacity. Experimental results show that our method achieves 75.4% accuracy on ImageNet without a pre-trained network, using only 41% of the original model’s computational cost. It also attains 0.5% higher performance in compressing the SSD network for object detection. Furthermore, we analyze the effects of nuclear norm-based regularization.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104247"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-preserved point-based human avatar 语义保留的基于点的人类化身
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104307
Lixiang Lin, Jianke Zhu
{"title":"Semantic-preserved point-based human avatar","authors":"Lixiang Lin,&nbsp;Jianke Zhu","doi":"10.1016/j.cviu.2025.104307","DOIUrl":"10.1016/j.cviu.2025.104307","url":null,"abstract":"<div><div>To enable realistic experience in AR/VR and digital entertainment, we present the first point-based human avatar model that embodies the entirety expressive range of digital humans. Specifically, we employ two MLPs to model pose-dependent deformation and linear skinning (LBS) weights. The representation of appearance relies on a decoder and the features attached to each point. In contrast to alternative implicit approaches, the oriented points representation not only provides a more intuitive way to model human avatar animation but also significantly reduces the computational time on both training and inference. Moreover, we propose a novel method to transfer semantic information from the SMPL-X model to the points, which enables to better understand human body movements. By leveraging the semantic information of points, we can facilitate virtual try-on and human avatar composition through exchanging the points of same category across different subjects. Experimental results demonstrate the efficacy of our presented method. Our implementation is publicly available at <span><span>https://github.com/l1346792580123/spa</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104307"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial intensity awareness for robust object detection 鲁棒目标检测的对抗强度感知
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104252
Jikang Cheng, Baojin Huang, Yan Fang, Zhen Han, Zhongyuan Wang
{"title":"Adversarial intensity awareness for robust object detection","authors":"Jikang Cheng,&nbsp;Baojin Huang,&nbsp;Yan Fang,&nbsp;Zhen Han,&nbsp;Zhongyuan Wang","doi":"10.1016/j.cviu.2024.104252","DOIUrl":"10.1016/j.cviu.2024.104252","url":null,"abstract":"<div><div>Like other computer vision models, object detectors are vulnerable to adversarial examples (AEs) containing imperceptible perturbations. These AEs can be generated with multiple intensities and then used to attack object detectors in real-world scenarios. One of the most effective ways to improve the robustness of object detectors is adversarial training (AT), which incorporates AEs into the training process. However, while previous AT-based models have shown certain robustness against adversarial attacks of a pre-specific intensity, they still struggle to maintain robustness when defending against adversarial attacks with multiple intensities. To address this issue, we propose a novel robust object detection method based on adversarial intensity awareness. We first explore potential schema to define the relationship between the neglected intensity information and actual evaluation metrics in AT. Then, we propose the sequential intensity loss (SI Loss) to represent and leverage the neglected intensity information in the AEs. Specifically, SI Loss deploys a sequential adaptive strategy to transform intensity into concrete learnable metrics in a discrete and cumulative manner. Additionally, a boundary smoothing algorithm is introduced to mitigate the influence of some particular AEs that challenging to be divided into a certain intensity level. Extensive experiments on PASCAL VOC and MS-COCO datasets substantially demonstrate the superior performance of our method over other defense methods against multi-intensity adversarial attacks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104252"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Generating Terminal Correction Imaging method for modular LED integral imaging systems 模块化LED集成成像系统的联合生成终端校正成像方法
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104279
Tianshu Li, Shigang Wang
{"title":"Joint Generating Terminal Correction Imaging method for modular LED integral imaging systems","authors":"Tianshu Li,&nbsp;Shigang Wang","doi":"10.1016/j.cviu.2025.104279","DOIUrl":"10.1016/j.cviu.2025.104279","url":null,"abstract":"<div><div>Integral imaging has garnered significant attention in 3D display technology due to its potential for high-quality visualization. However, elemental images in integral imaging systems usually suffer from misalignment due to the mechanical or human-induced assembly within the lens arrays, leading to undesirable display quality. This paper introduces a novel Joint-Generating Terminal Correction Imaging (JGTCI) approach tailored for large-scale, modular LED integral imaging systems to address the misalignment between the optical centers of physical lens arrays and the camera in generated elemental image arrays. Specifically, we propose: (1) a high-sensitivity calibration marker to enhance alignment precision by accurately matching lens centers to the central points of elemental images; (2) a partitioned calibration strategy that supports independent calibration of display sections, enabling seamless system expansion without recalibrating previously adjusted regions; and (3) a calibration setup where markers are strategically placed near the lens focal length, ensuring optimal pixel coverage in the camera frame for improved accuracy. Extensive experimental results demonstrate that our JGTCI approach significantly enhances 3D display accuracy, extends the viewing angle, and improves the scalability and practicality of modular integral imaging systems, outperforming recent state-of-the-art methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104279"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “LightSOD: Towards lightweight and efficient network for salient object detection” [J. Comput. Vis. Imag. Underst. 249 (2024) 104148] “LightSOD:一种轻量级和高效的显著目标检测网络”[J]。第一版。粘度图像放大。理解。249 (2024)104148]
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104277
Thien-Thu Ngo , Hoang Ngoc Tran , Md. Delowar Hossain , Eui-Nam Huh
{"title":"Corrigendum to “LightSOD: Towards lightweight and efficient network for salient object detection” [J. Comput. Vis. Imag. Underst. 249 (2024) 104148]","authors":"Thien-Thu Ngo ,&nbsp;Hoang Ngoc Tran ,&nbsp;Md. Delowar Hossain ,&nbsp;Eui-Nam Huh","doi":"10.1016/j.cviu.2024.104277","DOIUrl":"10.1016/j.cviu.2024.104277","url":null,"abstract":"","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104277"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143097181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信