IET Computer Vision最新文献

筛选
英文 中文
Crafting Transferable Adversarial Examples Against 3D Object Detection 制作可转移的对抗3D物体检测的例子
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-26 DOI: 10.1049/cvi2.70011
Haiyan Long, Hai Chen, Mengyao Xu, Chonghao Zhang, Fulan Qian
{"title":"Crafting Transferable Adversarial Examples Against 3D Object Detection","authors":"Haiyan Long,&nbsp;Hai Chen,&nbsp;Mengyao Xu,&nbsp;Chonghao Zhang,&nbsp;Fulan Qian","doi":"10.1049/cvi2.70011","DOIUrl":"https://doi.org/10.1049/cvi2.70011","url":null,"abstract":"<p>3D object detection is one of the current popular hotspots by perceiving the surrounding environment through LiDAR and camera sensors to recognise the category and location of objects in the scene. Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples. Although some approaches have begun to investigate the robustness of 3D object detection models, they are currently generating adversarial examples in a white-box setting and there is a lack of research into generating transferable adversarial examples in a black-box setting. In this paper, a non-end-to-end attack algorithm was proposed for LiDAR pipelines that crafts transferable adversarial examples against 3D object detection. Specifically, the method generates adversarial examples by restraining features with high contribution to downstream tasks and amplifying features with low contribution to downstream tasks in the feature space. Extensive experiments validate that the method produces more transferable adversarial point clouds, for example, the method generates adversarial point clouds in the nuScenes dataset that are about 10<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> and 7<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> better than the state-of-the-art method on mAP and NDS, respectively.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent Advances of Continual Learning in Computer Vision: An Overview 计算机视觉中持续学习的最新进展:概述
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-19 DOI: 10.1049/cvi2.70013
Haoxuan Qu, Hossein Rahmani, Li Xu, Bryan Williams, Jun Liu
{"title":"Recent Advances of Continual Learning in Computer Vision: An Overview","authors":"Haoxuan Qu,&nbsp;Hossein Rahmani,&nbsp;Li Xu,&nbsp;Bryan Williams,&nbsp;Jun Liu","doi":"10.1049/cvi2.70013","DOIUrl":"https://doi.org/10.1049/cvi2.70013","url":null,"abstract":"<p>In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing and accumulating new knowledge acquired at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularisation, knowledge distillation, memory, generative replay, parameter isolation and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of Multi-Object Tracking in Recent Times 近年来多目标跟踪研究综述
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-09 DOI: 10.1049/cvi2.70010
Suya Li, Hengyi Ren, Xin Xie, Ying Cao
{"title":"A Review of Multi-Object Tracking in Recent Times","authors":"Suya Li,&nbsp;Hengyi Ren,&nbsp;Xin Xie,&nbsp;Ying Cao","doi":"10.1049/cvi2.70010","DOIUrl":"https://doi.org/10.1049/cvi2.70010","url":null,"abstract":"<p>Multi-object tracking (MOT) is a fundamental problem in computer vision that involves tracing the trajectories of foreground targets throughout a video sequence while establishing correspondences for identical objects across frames. With the advancement of deep learning techniques, methods based on deep learning have significantly improved accuracy and efficiency in MOT. This paper reviews several recent deep learning-based MOT methods and categorises them into three main groups: detection-based, single-object tracking (SOT)-based, and segmentation-based methods, according to their core technologies. Additionally, this paper discusses the metrics and datasets used for evaluating MOT performance, the challenges faced in the field, and future directions for research.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143581368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TAPCNet: Tactile-Assisted Point Cloud Completion Network via Iterative Fusion Strategy 基于迭代融合策略的触觉辅助点云补全网络
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-07 DOI: 10.1049/cvi2.70012
Yangrong Liu, Jian Li, Huaiyu Wang, Ming Lu, Haorao Shen, Qin Wang
{"title":"TAPCNet: Tactile-Assisted Point Cloud Completion Network via Iterative Fusion Strategy","authors":"Yangrong Liu,&nbsp;Jian Li,&nbsp;Huaiyu Wang,&nbsp;Ming Lu,&nbsp;Haorao Shen,&nbsp;Qin Wang","doi":"10.1049/cvi2.70012","DOIUrl":"https://doi.org/10.1049/cvi2.70012","url":null,"abstract":"<p>With the development of the 3D point cloud field in recent years, point cloud completion of 3D objects has increasingly attracted researchers' attention. Point cloud data can accurately express the shape information of 3D objects at different resolutions, but the original point clouds collected directly by various 3D scanning equipment are often incomplete and have uneven density. Tactile is one distinctive way to perceive the 3D shape of an object. Tactile point clouds can provide local shape information for unknown areas during completion, which is a valuable complement to the point cloud data acquired with visual devices. In order to effectively improve the effect of point cloud completion using tactile information, the authors propose an innovative tactile-assisted point cloud completion network, TAPCNet. This network is the first neural network customised for the input of tactile point clouds and incomplete point clouds, which can fuse two types of point cloud information in the feature domain. Besides, a new dataset named 3DVT was rebuilt, to fit the proposed network model. Based on the tactile fusion strategy and related modules, multiple comparative experiments were conducted by controlling the quantity of tactile point clouds on the 3DVT dataset. The experimental data illustrates that TAPCNet can outperform the state-of-the-art methods in the benchmark.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143571267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating Transferable Adversarial Point Clouds via Autoencoders for 3D Object Classification 通过自动编码器为3D对象分类生成可转移的对抗点云
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-05 DOI: 10.1049/cvi2.70008
Mengyao Xu, Hai Chen, Chonghao Zhang, Yuanjun Zou, Chenchu Xu, Yanping Zhang, Fulan Qian
{"title":"Generating Transferable Adversarial Point Clouds via Autoencoders for 3D Object Classification","authors":"Mengyao Xu,&nbsp;Hai Chen,&nbsp;Chonghao Zhang,&nbsp;Yuanjun Zou,&nbsp;Chenchu Xu,&nbsp;Yanping Zhang,&nbsp;Fulan Qian","doi":"10.1049/cvi2.70008","DOIUrl":"https://doi.org/10.1049/cvi2.70008","url":null,"abstract":"<p>Recent studies have shown that deep neural networks are vulnerable to adversarial attacks. In the field of 3D point cloud classification, transfer-based black-box attack strategies have been explored to address the challenge of limited knowledge about the model in practical scenarios. However, existing approaches typically rely excessively on network structure, resulting in poor transferability of the generated adversarial examples. To address the above problem, the authors propose <i>AEattack</i>, an adversarial attack method capable of generating highly transferable adversarial examples. Specifically, AEattack employs an autoencoder (AE) to extract features from the point cloud data and reconstruct the adversarial point cloud based on these features. Notably, the AE does not require pre-training, and its parameters are jointly optimised using a loss function during the process of generating adversarial point clouds. The method makes the generated adversarial point cloud not overly dependent on the network structure, but more concerned with the data distribution. Moreover, this design endows AEattack with a broader potential for application. Extensive experiments on the ModelNet40 dataset show that AEattack is capable of generating highly transferable adversarial point clouds, with up to 61.8% improvement in transferability compared to state-of-the-art adversarial attacks.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143554396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Large-Scale Dataset for Marine Vessel Re-Identification Based on Swin Transformer Network in Ocean Surveillance Scenario 海洋监测场景下基于Swin变压器网络的大型船舶再识别新数据集
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-02 DOI: 10.1049/cvi2.70007
Zhi Lu, Liguo Sun, Pin Lv, Jiuwu Hao, Bo Tang, Xuanzhen Chen
{"title":"A New Large-Scale Dataset for Marine Vessel Re-Identification Based on Swin Transformer Network in Ocean Surveillance Scenario","authors":"Zhi Lu,&nbsp;Liguo Sun,&nbsp;Pin Lv,&nbsp;Jiuwu Hao,&nbsp;Bo Tang,&nbsp;Xuanzhen Chen","doi":"10.1049/cvi2.70007","DOIUrl":"https://doi.org/10.1049/cvi2.70007","url":null,"abstract":"<p>In recent years, there has been an upward trend that marine vessels, an important object category in marine monitoring, have gradually become a research focal point in the field of computer vision, such as detection, tracking, and classification. Among them, marine vessel re-identification (Re-ID) emerges as a significant frontier research topics, which not only faces the dual challenge of huge intra-class and small inter-class differences, but also has complex environmental interference in the port monitoring scenarios. To propel advancements in marine vessel Re-ID technology, SwinTransReID, a framework grounded in the Swin Transformer for marine vessel Re-ID, is introduced. Specifically, the project initially encodes the triplet images separately as a sequence of blocks and construct a baseline model leveraging the Swin Transformer, achieving better performance on the Re-ID benchmark dataset in comparison to convolution neural network (CNN)-based approaches. And it introduces side information embedding (SIE) to further enhance the robust feature-learning capabilities of Swin Transformer, thus, integrating non-visual cues (orientation and type of vessel) and other auxiliary information (hull colour) through the insertion of learnable embedding modules. Additionally, the project presents VesselReID-1656, the first annotated large-scale benchmark dataset for vessel Re-ID in real-world ocean surveillance, comprising 135,866 images of 1656 vessels along with 5 orientations, 12 types, and 17 colours. The proposed method achieves 87.1<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> mAP and 96.1<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> Rank-1 accuracy on the newly-labelled challenging dataset, which surpasses the state-of-the-art (SOTA) method by 1.9<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> mAP regarding to performance. Moreover, extensive empirical results demonstrate the superiority of the proposed SwinTransReID on the person Market-1501 dataset, vehicle VeRi-776 dataset, and Boat Re-ID vessel dataset.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143530544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature-Level Compensation and Alignment for Visible-Infrared Person Re-Identification 可见-红外人物再识别的特征级补偿与对准
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-02-25 DOI: 10.1049/cvi2.70005
Husheng Dong, Ping Lu, Yuanfeng Yang, Xun Sun
{"title":"Feature-Level Compensation and Alignment for Visible-Infrared Person Re-Identification","authors":"Husheng Dong,&nbsp;Ping Lu,&nbsp;Yuanfeng Yang,&nbsp;Xun Sun","doi":"10.1049/cvi2.70005","DOIUrl":"https://doi.org/10.1049/cvi2.70005","url":null,"abstract":"<p>Visible-infrared person re-identification (VI-ReID) aims to match pedestrian images captured by nonoverlapping visible and infrared cameras. Most existing compensation-based methods try to generate images of missing modality from the other ones. However, the generated images often fail to possess enough quality due to severe discrepancies between different modalities. Moreover, it is generally assumed that person images are roughly aligned during the extraction of part-based local features. However, this does not always hold true, typically when they are cropped via inaccurate pedestrian detectors. To alleviate such problems, the authors propose a novel feature-level compensation and alignment network (FCA-Net) for VI-ReID in this paper, which tries to compensate for the missing modality information on the channel-level and align part-based local features. Specifically, the visible and infrared features of low-level subnetworks are first processed by a channel feature compensation (CFC) module, which enforces the network to learn consistent distribution patterns of channel features, and thereby the cross-modality discrepancy is narrowed. To address spatial misalignment, a pairwise relation module (PRM) is introduced to incorporate human structural information into part-based local features, which can significantly enhance the feature discrimination power. Besides, a cross-modality part alignment loss (CPAL) is designed on the basis of a dynamic part matching algorithm, which can promote more accurate local matching. Extensive experiments on three standard VI-ReID datasets are conducted to validate the effectiveness of the proposed method, and the results show that state-of-the-art performance is achieved.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision 智能农业的进展:用计算机视觉进行最先进的植物病害检测的系统文献综述
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-02-14 DOI: 10.1049/cvi2.70004
Esra Yilmaz, Sevim Ceylan Bocekci, Cengiz Safak, Kazim Yildiz
{"title":"Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision","authors":"Esra Yilmaz,&nbsp;Sevim Ceylan Bocekci,&nbsp;Cengiz Safak,&nbsp;Kazim Yildiz","doi":"10.1049/cvi2.70004","DOIUrl":"https://doi.org/10.1049/cvi2.70004","url":null,"abstract":"<p>In an era of rapid digital transformation, ensuring sustainable and traceable food production is more crucial than ever. Plant diseases, a major threat to agriculture, lead to significant losses in crops and financial damage. Standard techniques for detecting diseases, though widespread, are lengthy and intensive work, especially in extensive agricultural settings. This systematic literature review examines the cutting-edge technologies in smart agriculture specifically computer vision, robotics, deep learning (DL), and Internet of Things (IoT) that are reshaping plant disease detection and management. By analysing 198 studies published between 2021 and 2023, from an initial pool of 19,838 papers, the authors reveal the dominance of DL, particularly with datasets such as PlantVillage, and highlight critical challenges, including dataset limitations, lack of geographical diversity, and the scarcity of real-world field data. Moreover, the authors explore the promising role of IoT, robotics, and drones in enhancing early disease detection, although the high costs and technological gaps present significant barriers for small-scale farmers, especially in developing countries. Through the preferred reporting items for systematic reviews and meta-analyses methodology, this review synthesises these findings, identifying key trends, uncovering research gaps, and offering actionable insights for the future of plant disease management in smart agriculture.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Egocentric action anticipation from untrimmed videos 以自我为中心的动作预期从未修剪的视频
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-02-14 DOI: 10.1049/cvi2.12342
Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella
{"title":"Egocentric action anticipation from untrimmed videos","authors":"Ivan Rodin,&nbsp;Antonino Furnari,&nbsp;Giovanni Maria Farinella","doi":"10.1049/cvi2.12342","DOIUrl":"https://doi.org/10.1049/cvi2.12342","url":null,"abstract":"<p>Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are ‘trimmed’, meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with ‘untrimmed’ video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12342","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controlling semantics of diffusion-augmented data for unsupervised domain adaptation 无监督域自适应扩散增强数据的控制语义
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-02-07 DOI: 10.1049/cvi2.70002
Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel
{"title":"Controlling semantics of diffusion-augmented data for unsupervised domain adaptation","authors":"Henrietta Ridley,&nbsp;Roberto Alcover-Couso,&nbsp;Juan C. SanMiguel","doi":"10.1049/cvi2.70002","DOIUrl":"https://doi.org/10.1049/cvi2.70002","url":null,"abstract":"<p>Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real-world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to their practical applications. This work addresses these challenges through synthetic-to-real style transfer leveraging diffusion models. The authors’ proposal incorporates semantic controllers to guide the diffusion process and low-rank adaptations (LoRAs) to ensure that style-transferred images align with real-world aesthetics while preserving semantic layout. Moreover, the authors introduce quality metrics to rank the utility of generated images, enabling the selective use of high-quality images for training. To further enhance reliability, the authors propose a novel loss function that mitigates artefacts from the style transfer process by incorporating only pixels aligned with the original semantic labels. Experimental results demonstrate that the authors’ proposal outperforms selected state-of-the-art methods for image generation and UDA training, achieving optimal performance even with a smaller set of high-quality generated images. The authors’ code and models are available at http://www-vpu.eps.uam.es/ControllingSem4UDA/.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信