International Journal of Computer Vision最新文献

筛选
英文 中文
ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer ICEv2:视觉转换器中的可解释性、全面性和可说明性
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-26 DOI: 10.1007/s11263-024-02290-6
Hoyoung Choi, Seungwan Jin, Kyungsik Han
{"title":"ICEv2: Interpretability, Comprehensiveness, and Explainability in Vision Transformer","authors":"Hoyoung Choi, Seungwan Jin, Kyungsik Han","doi":"10.1007/s11263-024-02290-6","DOIUrl":"https://doi.org/10.1007/s11263-024-02290-6","url":null,"abstract":"<p>Vision transformers use [CLS] token to predict image classes. Their explainability visualization has been studied using relevant information from the [CLS] token or focusing on attention scores during self-attention. However, such visualization is challenging because of the dependence of the interpretability of a vision transformer on skip connections and attention operators, the instability of non-linearities in the learning process, and the limited reflection of self-attention scores on relevance. We argue that the output patch embeddings in a vision transformer preserve the image information of each patch location, which can facilitate the prediction of an image class. In this paper, we propose ICEv2 (ICEv2: <span>({{{underline{varvec{I}}}}})</span>nterpretability, <span>({{{underline{varvec{C}}}}})</span>omprehensiveness, and <span>({{{underline{varvec{E}}}}})</span>xplainability in Vision Transformer), an explainability visualization method that addresses the limitations of ICE (i.e., high dependence of hyperparameters on performance and the inability to preserve the model’s properties) by minimizing the number of training encoder layers, redesigning the MLP layer, and optimizing hyperparameters along with various model size. Overall, ICEv2 shows higher efficiency, performance, robustness, and scalability than ICE. On the ImageNet-Segmentation dataset, ICEv2 outperformed all explainability visualization methods in all cases depending on the model size. On the Pascal VOC dataset, ICEv2 outperformed both self-supervised and supervised methods on Jaccard similarity. In the unsupervised single object discovery, where untrained classes are present in the images, ICEv2 effectively distinguished between foreground and background, showing performance comparable to the previous state-of-the-art. Lastly, ICEv2 can be trained with significantly lower training computational complexity.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"67 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142718529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Globally Correlation-Aware Hard Negative Generation 全局相关性感知硬负生成
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-25 DOI: 10.1007/s11263-024-02288-0
Wenjie Peng, Hongxiang Huang, Tianshui Chen, Quhui Ke, Gang Dai, Shuangping Huang
{"title":"Globally Correlation-Aware Hard Negative Generation","authors":"Wenjie Peng, Hongxiang Huang, Tianshui Chen, Quhui Ke, Gang Dai, Shuangping Huang","doi":"10.1007/s11263-024-02288-0","DOIUrl":"https://doi.org/10.1007/s11263-024-02288-0","url":null,"abstract":"<p>Hard negative generation aims to generate informative negative samples that help to determine the decision boundaries and thus facilitate advancing deep metric learning. Current works select pair/triplet samples, learn their correlations, and fuse them to generate hard negatives. However, these works merely consider the local correlations of selected samples, ignoring global sample correlations that would provide more significant information to generate more informative negatives. In this work, we propose a globally correlation-aware hard negative generation (GCA-HNG) framework, which first learns sample correlations from a global perspective and exploits these correlations to guide generating hardness-adaptive and diverse negatives. Specifically, this approach begins by constructing a structured graph to model sample correlations, where each node represents a specific sample and each edge represents the correlations between corresponding samples. Then, we introduce an iterative graph message propagation to propagate the messages of node and edge through the whole graph and thus learn the sample correlations globally. Finally, with the guidance of the learned global correlations, we propose a channel-adaptive manner to combine an anchor and multiple negatives for HNG. Compared to current methods, GCA-HNG allows perceiving sample correlations with numerous negatives from a global and comprehensive perspective and generates the negatives with better hardness and diversity. Extensive experiment results demonstrate that the proposed GCA-HNG is superior to related methods on four image retrieval benchmark datasets.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"80 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142697109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion IEBins:用于单目深度估算和完成的迭代弹性分区
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-25 DOI: 10.1007/s11263-024-02293-3
Shuwei Shao, Zhongcai Pei, Weihai Chen, Peter C. Y. Chen, Zhengguo Li
{"title":"IEBins: Iterative Elastic Bins for Monocular Depth Estimation and Completion","authors":"Shuwei Shao, Zhongcai Pei, Weihai Chen, Peter C. Y. Chen, Zhengguo Li","doi":"10.1007/s11263-024-02293-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02293-3","url":null,"abstract":"<p>Monocular depth estimation and completion are fundamental aspects of geometric computer vision, serving as essential techniques for various downstream applications. In recent developments, several methods have reformulated these two tasks as a <i>classification-regression</i> problem, deriving depth with a linear combination of predicted probabilistic distribution and bin centers. In this paper, we introduce an innovative concept termed <b>iterative elastic bins (IEBins)</b> for the classification-regression-based monocular depth estimation and completion. The IEBins involves the idea of iterative division of bins. In the initialization stage, a coarse and uniform discretization is applied to the entire depth range. Subsequent update stages then iteratively identify and uniformly discretize the target bin, by leveraging it as the new depth range for further refinement. To mitigate the risk of error accumulation during iterations, we propose a novel elastic target bin, replacing the original one. The width of this elastic bin is dynamically adapted according to the depth uncertainty. Furthermore, we develop dedicated frameworks to instantiate the IEBins. Extensive experiments on the KITTI, NYU-Depth-v2, SUN RGB-D, ScanNet and DIODE datasets indicate that our method outperforms prior state-of-the-art monocular depth estimation and completion competitors.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"43 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142712464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer for Object Re-identification: A Survey 物体再识别转换器:一项调查
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-23 DOI: 10.1007/s11263-024-02284-4
Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du
{"title":"Transformer for Object Re-identification: A Survey","authors":"Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du","doi":"10.1007/s11263-024-02284-4","DOIUrl":"https://doi.org/10.1007/s11263-024-02284-4","url":null,"abstract":"<p>Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"15 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach CNN 中归因图的可靠评估:基于扰动的方法
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-23 DOI: 10.1007/s11263-024-02282-6
Lars Nieradzik, Henrike Stephani, Janis Keuper
{"title":"Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach","authors":"Lars Nieradzik, Henrike Stephani, Janis Keuper","doi":"10.1007/s11263-024-02282-6","DOIUrl":"https://doi.org/10.1007/s11263-024-02282-6","url":null,"abstract":"<p>In this paper, we present an approach for evaluating attribution maps, which play a central role in interpreting the predictions of convolutional neural networks (CNNs). We show that the widely used insertion/deletion metrics are susceptible to distribution shifts that affect the reliability of the ranking. Our method proposes to replace pixel modifications with adversarial perturbations, which provides a more robust evaluation framework. By using smoothness and monotonicity measures, we illustrate the effectiveness of our approach in correcting distribution shifts. In addition, we conduct the most comprehensive quantitative and qualitative assessment of attribution maps to date. Introducing baseline attribution maps as sanity checks, we find that our metric is the only contender to pass all checks. Using Kendall’s <span>(tau )</span> rank correlation coefficient, we show the increased consistency of our metric across 15 dataset-architecture combinations. Of the 16 attribution maps tested, our results clearly show SmoothGrad to be the best map currently available. This research makes an important contribution to the development of attribution maps by providing a reliable and consistent evaluation framework. To ensure reproducibility, we will provide the code along with our results.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"18 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One-Shot Generative Domain Adaptation in 3D GANs 三维泛函网络中的单次生成域自适应
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-22 DOI: 10.1007/s11263-024-02268-4
Ziqiang Li, Yi Wu, Chaoyue Wang, Xue Rui, Bin Li
{"title":"One-Shot Generative Domain Adaptation in 3D GANs","authors":"Ziqiang Li, Yi Wu, Chaoyue Wang, Xue Rui, Bin Li","doi":"10.1007/s11263-024-02268-4","DOIUrl":"https://doi.org/10.1007/s11263-024-02268-4","url":null,"abstract":"<p>3D-aware image generation necessitates extensive training data to ensure stable training and mitigate the risk of overfitting. This paper first consider a novel task known as One-shot 3D Generative Domain Adaptation (GDA), aimed at transferring a pre-trained 3D generator from one domain to a new one, relying solely on a single reference image. One-shot 3D GDA is characterized by the pursuit of specific attributes, namely, <i>high fidelity</i>, <i>large diversity</i>, <i>cross-domain consistency</i>, and <i>multi-view consistency</i>. Within this paper, we introduce 3D-Adapter, the first one-shot 3D GDA method, for diverse and faithful generation. Our approach begins by judiciously selecting a restricted weight set for fine-tuning, and subsequently leverages four advanced loss functions to facilitate adaptation. An efficient progressive fine-tuning strategy is also implemented to enhance the adaptation process. The synergy of these three technological components empowers 3D-Adapter to achieve remarkable performance, substantiated both quantitatively and qualitatively, across all desired properties of 3D GDA. Furthermore, 3D-Adapter seamlessly extends its capabilities to zero-shot scenarios, and preserves the potential for crucial tasks such as interpolation, reconstruction, and editing within the latent space of the pre-trained generator. Code will be available at https://github.com/iceli1007/3D-Adapter.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"61 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization NAFT 和 SynthStab:基于 RAFT 的网络和用于数字视频稳定的合成数据集
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-22 DOI: 10.1007/s11263-024-02264-8
Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini
{"title":"NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization","authors":"Marcos Roberto e Souza, Helena de Almeida Maia, Helio Pedrini","doi":"10.1007/s11263-024-02264-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02264-8","url":null,"abstract":"<p>Multiple deep learning-based stabilization methods have been proposed recently. Some of them directly predict the optical flow to warp each unstable frame into its stabilized version, which we called direct warping. These methods primarily perform online or semi-online stabilization, prioritizing lower computational cost while achieving satisfactory results in certain scenarios. However, they fail to smooth intense instabilities and have considerably inferior results in comparison to other approaches. To improve their quality and reduce this difference, we propose: (a) NAFT, a new direct warping semi-online stabilization method, which adapts RAFT to videos by including a neighborhood-aware update mechanism, called IUNO. By using our training approach along with IUNO, we can learn the characteristics that contribute to video stability from the data patterns, rather than requiring an explicit stability definition. Furthermore, we demonstrate how leveraging an off-the-shelf video inpainting method to achieve full-frame stabilization; (b) SynthStab, a new synthetic dataset consisting of paired videos that allows supervision by camera motion instead of pixel similarities. To build SynthStab, we modeled camera motion using kinematic concepts. In addition, the unstable motion respects scene constraints, such as depth variation. We performed several experiments on SynthStab to develop and validate NAFT. We compared our results with five other methods from the literature with publicly available code. Our experimental results show that we were able to stabilize intense camera motion, outperforming other direct warping methods and bringing its performance closer to state-of-the-art methods. In terms of computational resources, our smallest network has only about 7% of model size and trainable parameters than the smallest values among the competing methods.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"24 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CS-CoLBP: Cross-Scale Co-occurrence Local Binary Pattern for Image Classification CS-CoLBP:用于图像分类的跨尺度共现局部二进制模式
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-19 DOI: 10.1007/s11263-024-02297-z
Bin Xiao, Danyu Shi, Xiuli Bi, Weisheng Li, Xinbo Gao
{"title":"CS-CoLBP: Cross-Scale Co-occurrence Local Binary Pattern for Image Classification","authors":"Bin Xiao, Danyu Shi, Xiuli Bi, Weisheng Li, Xinbo Gao","doi":"10.1007/s11263-024-02297-z","DOIUrl":"https://doi.org/10.1007/s11263-024-02297-z","url":null,"abstract":"<p>The local binary pattern (LBP) is an effective feature, describing the size relationship between the neighboring pixels and the current pixel. While individual LBP-based methods yield good results, co-occurrence LBP-based methods exhibit a better ability to extract structural information. However, most of the co-occurrence LBP-based methods excel mainly in dealing with rotated images, exhibiting limitations in preserving performance for scaled images. To address the issue, a cross-scale co-occurrence LBP (CS-CoLBP) is proposed. Initially, we construct an LBP co-occurrence space to capture robust structural features by simulating scale transformation. Subsequently, we use Cross-Scale Co-occurrence pairs (CS-Co pairs) to extract the structural features, keeping robust descriptions even in the presence of scaling. Finally, we refine these CS-Co pairs through Rotation Consistency Adjustment (RCA) to bolster their rotation invariance, thereby making the proposed CS-CoLBP as powerful as existing co-occurrence LBP-based methods for rotated image description. While keeping the desired geometric invariance, the proposed CS-CoLBP maintains a modest feature dimension. Empirical evaluations across several datasets demonstrate that CS-CoLBP outperforms the existing state-of-the-art LBP-based methods even in the presence of geometric transformations and image manipulations.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"53 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Warping the Residuals for Image Editing with StyleGAN 使用 StyleGAN 对残差进行翘曲处理以进行图像编辑
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-18 DOI: 10.1007/s11263-024-02301-6
Ahmet Burak Yildirim, Hamza Pehlivan, Aysegul Dundar
{"title":"Warping the Residuals for Image Editing with StyleGAN","authors":"Ahmet Burak Yildirim, Hamza Pehlivan, Aysegul Dundar","doi":"10.1007/s11263-024-02301-6","DOIUrl":"https://doi.org/10.1007/s11263-024-02301-6","url":null,"abstract":"<p>StyleGAN models show editing capabilities via their semantically interpretable latent organizations which require successful GAN inversion methods to edit real images. Many works have been proposed for inverting images into StyleGAN’s latent space. However, their results either suffer from low fidelity to the input image or poor editing qualities, especially for edits that require large transformations. That is because low bit rate latent spaces lose many image details due to the information bottleneck even though it provides an editable space. On the other hand, higher bit rate latent spaces can pass all the image details to StyleGAN for perfect reconstruction of images but suffer from low editing qualities. In this work, we present a novel image inversion architecture that extracts high-rate latent features and includes a flow estimation module to warp these features to adapt them to edits. This is because edits often involve spatial changes in the image, such as adjustments to pose or smile. Thus, high-rate latent features must be accurately repositioned to match their new locations in the edited image space. We achieve this by employing flow estimation to determine the necessary spatial adjustments, followed by warping the features to align them correctly in the edited image. Specifically, we estimate the flows from StyleGAN features of edited and unedited latent codes. By estimating the high-rate features and warping them for edits, we achieve both high-fidelity to the input image and high-quality edits. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"64 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation 将目标拉向源头:领域自适应语义分割的新视角
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2024-11-16 DOI: 10.1007/s11263-024-02285-3
Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang
{"title":"Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation","authors":"Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang","doi":"10.1007/s11263-024-02285-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02285-3","url":null,"abstract":"<p>Domain-adaptive semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, existing methods primarily focus on directly learning categorically discriminative target features for segmenting target images, which is challenging in the absence of target labels. This work provides a new perspective. We ob serve that the features learned with source data manage to keep categorically discriminative during training, thereby enabling us to implicitly learn adequate target representations by simply <i>pulling target features close to source features for each category</i>. To this end, we propose T2S-DA, which encourages the model to learn similar cross-domain features. Also, considering the pixel categories are heavily imbalanced for segmentation datasets, we come up with a dynamic re-weighting strategy to help the model concentrate on those underperforming classes. Extensive experiments confirm that T2S-DA learns a more discriminative and generalizable representation, significantly surpassing the state-of-the-art. We further show that T2S-DA is quite qualified for the domain generalization task, verifying its domain-invariant property.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"99 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信