International Journal of Computer Vision最新文献

筛选
英文 中文
Building 3D Generative Models from Minimal Data 从最小数据构建3D生成模型
2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-13 DOI: 10.1007/s11263-023-01870-2
Skylar Sutherland, Bernhard Egger, Joshua Tenenbaum
{"title":"Building 3D Generative Models from Minimal Data","authors":"Skylar Sutherland, Bernhard Egger, Joshua Tenenbaum","doi":"10.1007/s11263-023-01870-2","DOIUrl":"https://doi.org/10.1007/s11263-023-01870-2","url":null,"abstract":"Abstract We propose a method for constructing generative models of 3D objects from a single 3D mesh and improving them through unsupervised low-shot learning from 2D images. Our method produces a 3D morphable model that represents shape and albedo in terms of Gaussian processes. Whereas previous approaches have typically built 3D morphable models from multiple high-quality 3D scans through principal component analysis, we build 3D morphable models from a single scan or template. As we demonstrate in the face domain, these models can be used to infer 3D reconstructions from 2D data (inverse graphics) or 3D data (registration). Specifically, we show that our approach can be used to perform face recognition using only a single 3D template (one scan total, not one per person). We extend our model to a preliminary unsupervised learning framework that enables the learning of the distribution of 3D faces using one 3D template and a small number of 2D images. Our approach is motivated as a potential model for the origins of face perception in human infants, who appear to start with an innate face template and subsequently develop a flexible system for perceiving the 3D structure of any novel face from experience with only 2D images of a relatively small number of familiar faces.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135740464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels 好奇的外行人:没有专家标签的细粒度图像识别
2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-13 DOI: 10.1007/s11263-023-01885-9
Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi
{"title":"The Curious Layperson: Fine-Grained Image Recognition Without Expert Labels","authors":"Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi","doi":"10.1007/s11263-023-01885-9","DOIUrl":"https://doi.org/10.1007/s11263-023-01885-9","url":null,"abstract":"Abstract Most of us are not experts in specific fields, such as ornithology. Nonetheless, we do have general image and language understanding capabilities that we use to match what we see to expert resources. This allows us to expand our knowledge and perform novel tasks without ad-hoc external supervision. On the contrary, machines have a much harder time consulting expert-curated knowledge bases unless trained specifically with that knowledge in mind. Thus, in this paper we consider a new problem: fine-grained image recognition without expert annotations, which we address by leveraging the vast knowledge available in web encyclopedias. First, we learn a model to describe the visual appearance of objects using non-expert image descriptions. We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis. We evaluate the method on two datasets (CUB-200 and Oxford-102 Flowers) and compare with several strong baselines and the state of the art in cross-modal retrieval. Code is available at: https://github.com/subhc/clever .","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134989642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains 面向有限数据域的高效知识转移的挖掘生成模型
2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-09 DOI: 10.1007/s11263-023-01882-y
Yaxing Wang, Abel Gonzalez-Garcia, Chenshen Wu, Luis Herranz, Fahad Shahbaz Khan, Shangling Jui, Joost van de Weijer
{"title":"MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains","authors":"Yaxing Wang, Abel Gonzalez-Garcia, Chenshen Wu, Luis Herranz, Fahad Shahbaz Khan, Shangling Jui, Joost van de Weijer","doi":"10.1007/s11263-023-01882-y","DOIUrl":"https://doi.org/10.1007/s11263-023-01882-y","url":null,"abstract":"GANs largely increases the potential impact of generative models. Therefore, we propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods, such as mode collapse and lack of flexibility. Furthermore, to prevent overfitting on small target domains, we introduce sparse subnetwork selection, that restricts the set of trainable neurons to those that are relevant for the target dataset. We perform comprehensive experiments on several challenging datasets using various GAN architectures (BigGAN, Progressive GAN, and StyleGAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136108417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
One-Pot Multi-frame Denoising 一锅多帧去噪
2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-09 DOI: 10.1007/s11263-023-01887-7
Lujia Jin, Qing Guo, Shi Zhao, Lei Zhu, Qian Chen, Qiushi Ren, Yanye Lu
{"title":"One-Pot Multi-frame Denoising","authors":"Lujia Jin, Qing Guo, Shi Zhao, Lei Zhu, Qian Chen, Qiushi Ren, Yanye Lu","doi":"10.1007/s11263-023-01887-7","DOIUrl":"https://doi.org/10.1007/s11263-023-01887-7","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136192056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring 深理查森-露西反卷积弱光图像去模糊
2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-07 DOI: 10.1007/s11263-023-01877-9
Liang Chen, Jiawei Zhang, Zhenhua Li, Yunxuan Wei, Faming Fang, Jimmy Ren, Jinshan Pan
{"title":"Deep Richardson–Lucy Deconvolution for Low-Light Image Deblurring","authors":"Liang Chen, Jiawei Zhang, Zhenhua Li, Yunxuan Wei, Faming Fang, Jimmy Ren, Jinshan Pan","doi":"10.1007/s11263-023-01877-9","DOIUrl":"https://doi.org/10.1007/s11263-023-01877-9","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135048151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective 转移视觉语言模型用于视觉识别:一个分类器的视角
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-07 DOI: 10.1007/s11263-023-01876-w
Wenhao Wu, Zhun Sun, Yuxin Song, Jingdong Wang, Wanli Ouyang
{"title":"Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective","authors":"Wenhao Wu, Zhun Sun, Yuxin Song, Jingdong Wang, Wanli Ouyang","doi":"10.1007/s11263-023-01876-w","DOIUrl":"https://doi.org/10.1007/s11263-023-01876-w","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":" ","pages":""},"PeriodicalIF":19.5,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48889613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learning Robust Facial Representation From the View of Diversity and Closeness 从多样性和紧密性的角度学习稳健的面部表征
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-07 DOI: 10.1007/s11263-023-01893-9
Chaoyu Zhao, J. Qian, Shumin Zhu, J. Xie, Jian Yang
{"title":"Learning Robust Facial Representation From the View of Diversity and Closeness","authors":"Chaoyu Zhao, J. Qian, Shumin Zhu, J. Xie, Jian Yang","doi":"10.1007/s11263-023-01893-9","DOIUrl":"https://doi.org/10.1007/s11263-023-01893-9","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":" ","pages":""},"PeriodicalIF":19.5,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42897991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain Generalization 内部,基于源外范例的改进领域泛化风格综合
2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-07 DOI: 10.1007/s11263-023-01878-8
Yumeng Li, Dan Zhang, Margret Keuper, Anna Khoreva
{"title":"Intra- &amp; Extra-Source Exemplar-Based Style Synthesis for Improved Domain Generalization","authors":"Yumeng Li, Dan Zhang, Margret Keuper, Anna Khoreva","doi":"10.1007/s11263-023-01878-8","DOIUrl":"https://doi.org/10.1007/s11263-023-01878-8","url":null,"abstract":"Abstract The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an exemplar-based style synthesis pipeline to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The model learns to faithfully reconstruct the image, preserving its semantic layout through noise prediction. Random masking of the estimated noise enables the style mixing capability of our model, i.e. it allows to alter the global appearance without affecting the semantic layout of an image. Using the proposed masked noise encoder to randomize style and content combinations in the training set, i.e., intra-source style augmentation ( $$textrm{ISSA}$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mtext>ISSA</mml:mtext> </mml:math> ) effectively increases the diversity of training data and reduces spurious correlation. As a result, we achieve up to $$12.4%$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>12.4</mml:mn> <mml:mo>%</mml:mo> </mml:mrow> </mml:math> mIoU improvements on driving-scene semantic segmentation under different types of data shifts, i.e., changing geographic locations, adverse weather conditions, and day to night. $$textrm{ISSA}$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mtext>ISSA</mml:mtext> </mml:math> is model-agnostic and straightforwardly applicable with CNNs and Transformers. It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $$3%$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>3</mml:mn> <mml:mo>%</mml:mo> </mml:mrow> </mml:math> mIoU in Cityscapes to Dark Zürich. In addition, we demonstrate the strong plug-n-play ability of the proposed style synthesis pipeline, which is readily usable for extra-source exemplars e.g., web-crawled images, without any retraining or fine-tuning. Moreover, we study a new use case to indicate neural network’s generalization capability by building a stylized proxy validation set. This application has significant practical sense for selecting models to be deployed in the open-world environment. Our code is available at https://github.com/boschresearch/ISSA .","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135048150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind Image Deblurring with Unknown Kernel Size and Substantial Noise 未知核大小和大量噪声的盲图像去模糊
2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-04 DOI: 10.1007/s11263-023-01883-x
Zhong Zhuang, Taihui Li, Hengkang Wang, Ju Sun
{"title":"Blind Image Deblurring with Unknown Kernel Size and Substantial Noise","authors":"Zhong Zhuang, Taihui Li, Hengkang Wang, Ju Sun","doi":"10.1007/s11263-023-01883-x","DOIUrl":"https://doi.org/10.1007/s11263-023-01883-x","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135404498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Towards Defending Multiple $$ell _p$$-Norm Bounded Adversarial Perturbations via Gated Batch Normalization 通过门控批归一化防御多个$$ell _p$$ -范数有界对抗性扰动
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2023-09-04 DOI: 10.1007/s11263-023-01884-w
Aishan Liu, Shiyu Tang, Xinyun Chen, Lei Huang, Haotong Qin, Xianglong Liu, Dacheng Tao
{"title":"Towards Defending Multiple $$ell _p$$-Norm Bounded Adversarial Perturbations via Gated Batch Normalization","authors":"Aishan Liu, Shiyu Tang, Xinyun Chen, Lei Huang, Haotong Qin, Xianglong Liu, Dacheng Tao","doi":"10.1007/s11263-023-01884-w","DOIUrl":"https://doi.org/10.1007/s11263-023-01884-w","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":" ","pages":""},"PeriodicalIF":19.5,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43182019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信