2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)最新文献

筛选
英文 中文
MixGen: A New Multi-Modal Data Augmentation MixGen:一个新的多模态数据增强
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) Pub Date : 2022-06-16 DOI: 10.1109/WACVW58289.2023.00042
Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, Wanqian Zhang, Boyang Li, Mu Li
{"title":"MixGen: A New Multi-Modal Data Augmentation","authors":"Xiaoshuai Hao, Yi Zhu, Srikar Appalaraju, Aston Zhang, Wanqian Zhang, Boyang Li, Mu Li","doi":"10.1109/WACVW58289.2023.00042","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00042","url":null,"abstract":"Data augmentation is a necessity to enhance data efficiency in deep learning. For vision-language pre-training, data is only augmented either for images or for text in previous works. In this paper, we present MixGen: a joint data augmentation for vision-language representation learning to further improve data efficiency. It generates new image-text pairs with semantic relationships preserved by interpolating images and concatenating text. It's simple, and can be plug-and-played into existing pipelines. We evaluate MixGen on four architectures, including CLIP, ViLT, ALBEF and TCL, across five downstream vision-language tasks to show its versatility and effectiveness. For example, adding MixGen in ALBEF pre-training leads to absolute performance improvements on downstream tasks: image-text retrieval (+6.2% on COCO fine-tuned and +5.3% on Flicker30K zero-shot), visual grounding (+0.9% on Re-fCOCO+), visual reasoning (+0.9% on NLVR2), visual question answering (+0.3% on VQA2.0), and visual entail-ment (+0.4% on SNLI-VE).","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134523634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
The Gender Gap in Face Recognition Accuracy Is a Hairy Problem 人脸识别准确率的性别差异是一个棘手的问题
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) Pub Date : 2022-06-10 DOI: 10.1109/WACVW58289.2023.00034
Aman Bhatta, Vítor Albiero, K. Bowyer, M. King
{"title":"The Gender Gap in Face Recognition Accuracy Is a Hairy Problem","authors":"Aman Bhatta, Vítor Albiero, K. Bowyer, M. King","doi":"10.1109/WACVW58289.2023.00034","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00034","url":null,"abstract":"It is broadly accepted that there is a “gender gap” in face recognition accuracy, with females having lower accuracy. However, relatively little is known about the cause(s) of this gender gap. We first demonstrate that female and male hairstyles have important differences that impact face recognition accuracy. In particular, variation in male facial hair contributes to a greater average difference in appearance between different male faces. We then demonstrate that when the data used to evaluate recognition accuracy is gender-balanced for how hairstyles occlude the face, the initially observed gender gap in accuracy largely disappears. We show this result for two different matchers, and for a Caucasian image dataset and an African-American dataset. Our results suggest that research on demographic variation in accuracy should include a check for balanced quality of the test data as part of the problem formulation. This new understanding of the causes of the gender gap in recognition accuracy will hopefully promote rational consideration of what might be done about it. To promote reproducible research, the matchers, attribute classifiers, and datasets used in this work are available to other researchers.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132180132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations 基于解纠缠瀑布表示的自底向上姿态估计
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) Pub Date : 2021-12-20 DOI: 10.1109/WACVW58289.2023.00059
Bruno Artacho, A. Savakis
{"title":"BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations","authors":"Bruno Artacho, A. Savakis","doi":"10.1109/WACVW58289.2023.00059","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00059","url":null,"abstract":"We propose BAPose, a novel bottom-up approach that achieves state-of-the-art results for multi-person pose estimation. Our end-to-end trainable framework leverages a disentangled multi-scale waterfall architecture and incorporates adaptive convolutions to infer keypoints more precisely in crowded scenes with occlusions. The multiscale representations, obtained by the disentangled water-fall module in BAPose, leverage the efficiency of progres-sive filtering in the cascade architecture, while maintaining multi-scale fields-of- view comparable to spatial pyra-mid configurations. Our results on the challenging COCO and CrowdPose datasets demonstrate that BAPose is an efficient and robust framework for multi-person pose estimation, significantly improving state-of-the-art accuracy. Human Pose Estimation, Multi-Scale Representations","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120967214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting Inter-pixel Correlations in Unsupervised Domain Adaptation for Semantic Segmentation 利用无监督域自适应的像素间相关性进行语义分割
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) Pub Date : 2021-10-21 DOI: 10.1109/WACVW58289.2023.00006
Inseop Chung, Jayeon Yoo, Nojun Kwak
{"title":"Exploiting Inter-pixel Correlations in Unsupervised Domain Adaptation for Semantic Segmentation","authors":"Inseop Chung, Jayeon Yoo, Nojun Kwak","doi":"10.1109/WACVW58289.2023.00006","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00006","url":null,"abstract":"“Self-training” has become a dominant method for se-mantic segmentation via unsupervised domain adaptation (UDA). It creates a set of pseudo labels for the target do-main to give explicit supervision. However, the pseudo la-bels are noisy, sparse and do not provide any information about inter-pixel correlations. We regard inter-pixel cor-relation quite important because semantic segmentation is a task of predicting highly structured pixel-level outputs. Therefore, in this paper, we propose a method of transfer-ring the inter-pixel correlations from the source domain to the target domain via a self-attention module. The module takes the prediction of the segmentation network as an in-put and creates a self-attended prediction that correlates similar pixels. The module is trained only on the source domain to learn the domain-invariant inter-pixel correlations, then later, it is used to train the segmentation network on the target domain. The network learns not only from the pseudo labels but also by following the output of the self-attention module which provides additional knowledge about the inter-pixel correlations. Through extensive ex-periments, we show that our method significantly improves the performance on two standard UDA benchmarks and also can be combined with recent state-of-the-art method to achieve better performance.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133707111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bringing Generalization to Deep Multi-View Pedestrian Detection 为深度多视角行人检测带来泛化
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) Pub Date : 2021-09-24 DOI: 10.1109/WACVW58289.2023.00016
Jeet K. Vora, Swetanjal Dutta, Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi
{"title":"Bringing Generalization to Deep Multi-View Pedestrian Detection","authors":"Jeet K. Vora, Swetanjal Dutta, Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi","doi":"10.1109/WACVW58289.2023.00016","DOIUrl":"https://doi.org/10.1109/WACVW58289.2023.00016","url":null,"abstract":"Multi-View Detection (MVD) is highly effective for occlusion reasoning in a crowded environment. While recent works using deep learning have made significant ad-vances in the field, they have overlooked the generalization aspect, which makes them impractical for real-world deployment. The key novelty of our work is to formalize three critical forms of generalization and propose experiments to evaluate them: generalization with i) a varying number of cameras, ii) varying camera positions, and fi-nally, iii) to new scenes. We find that existing state-of-the-art models show poor generalization by overfitting to a single scene and camera configuration. To address the concerns: (a) we propose a novel Generalized MVD (GMVD) dataset, assimilating diverse scenes with changing daytime, camera configurations, and a varying number of cameras, and (b) we discuss the properties essential to bring gener-alization to MVD and propose a barebones model incorpo-rating them. We present comprehensive set of experiments on WildTrack, MultiViewX and the GMVD datasets to moti-vate the necessity to evaluate the generalization abilities of MVD methods and to demonstrate the efficacy of the proposed approach. The code and dataset are available at https://github.com/jeetv/GMVD.","PeriodicalId":306545,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129686754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信