BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference最新文献_第10页

Flow-based GAN for 3D Point Cloud Generation from a Single Image 基于流的GAN从单幅图像生成三维点云

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-08 DOI: 10.48550/arXiv.2210.04072

Yao Wei, G. Vosselman, M. Yang

引用次数: 3

Revisiting Self-Supervised Contrastive Learning for Facial Expression Recognition 面部表情识别的自我监督对比学习研究

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-08 DOI: 10.48550/arXiv.2210.03853

Yuxuan Shu, Xiao Gu, Guangyao Yang, Benny P. L. Lo

引用次数: 9

Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling 通过解耦时空建模学习视频问答的细粒度视觉理解

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-08 DOI: 10.48550/arXiv.2210.03941

Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

{"title":"Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling","authors":"Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu","doi":"10.48550/arXiv.2210.03941","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03941","url":null,"abstract":"While recent large-scale video-language pre-training made great progress in video question answering, the design of spatial modeling of video-language models is less fine-grained than that of image-language models; existing practices of temporal modeling also suffer from weak and noisy alignment between modalities. To learn fine-grained visual understanding, we decouple spatial-temporal modeling and propose a hybrid pipeline, Decoupled Spatial-Temporal Encoders, integrating an image- and a video-language encoder. The former encodes spatial semantics from larger but sparsely sampled frames independently of time, while the latter models temporal dynamics at lower spatial but higher temporal resolution. To help the video-language model learn temporal relations for video QA, we propose a novel pre-training objective, Temporal Referring Modeling, which requires the model to identify temporal positions of events in video sequences. Extensive experiments demonstrate that our model outperforms previous work pre-trained on orders of magnitude larger datasets.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"34 1","pages":"116"},"PeriodicalIF":0.0,"publicationDate":"2022-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80313780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual Pyramid Generative Adversarial Networks for Semantic Image Synthesis 语义图像合成的双金字塔生成对抗网络

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-08 DOI: 10.48550/arXiv.2210.04085

Shijie Li, Ming-Ming Cheng, Juergen Gall

引用次数: 6

Multiple Object Tracking from appearance by hierarchically clustering tracklets 多目标跟踪从外观分层聚类跟踪

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03355

Andreu Girbau, F. Marqu'es, Shin’ichi Satoh

引用次数: 3

SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models SVL-Adapter:视觉语言预训练模型的自监督适配器

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03794

Omiros Pantazis, G. Brostow, Kate Jones, Oisin Mac Aodha

{"title":"SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models","authors":"Omiros Pantazis, G. Brostow, Kate Jones, Oisin Mac Aodha","doi":"10.48550/arXiv.2210.03794","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03794","url":null,"abstract":"Vision-language models such as CLIP are pretrained on large volumes of internet sourced image and text pairs, and have been shown to sometimes exhibit impressive zero- and low-shot image classification performance. However, due to their size, fine-tuning these models on new datasets can be prohibitively expensive, both in terms of the supervision and compute required. To combat this, a series of light-weight adaptation methods have been proposed to efficiently adapt such models when limited supervision is available. In this work, we show that while effective on internet-style datasets, even those remedies under-deliver on classification tasks with images that differ significantly from those commonly found online. To address this issue, we present a new approach called SVL-Adapter that combines the complementary strengths of both vision-language pretraining and self-supervised representation learning. We report an average classification accuracy improvement of 10% in the low-shot setting when compared to existing methods, on a set of challenging visual classification tasks. Further, we present a fully automatic way of selecting an important blending hyperparameter for our model that does not require any held-out labeled validation data. Code for our project is available here: https://github.com/omipan/svl_adapter.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"63 1","pages":"580"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77140803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A Simple Plugin for Transforming Images to Arbitrary Scales 一个简单的插件转换图像到任意尺度

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03417

Qinye Zhou, Zi-Hua Li, Weidi Xie, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang

{"title":"A Simple Plugin for Transforming Images to Arbitrary Scales","authors":"Qinye Zhou, Zi-Hua Li, Weidi Xie, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang","doi":"10.48550/arXiv.2210.03417","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03417","url":null,"abstract":"Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios. In this paper, we aim to develop a general plugin that can be inserted into existing super-resolution models, conveniently augmenting their ability towards Arbitrary Resolution Image Scaling, thus termed ARIS. We make the following contributions: (i) we propose a transformer-based plugin module, which uses spatial coordinates as query, iteratively attend the low-resolution image feature through cross-attention, and output visual feature for the queried spatial location, resembling an implicit representation for images; (ii) we introduce a novel self-supervised training scheme, that exploits consistency constraints to effectively augment the model's ability for upsampling images towards unseen scales, i.e. ground-truth high-resolution images are not available; (iii) without loss of generality, we inject the proposed ARIS plugin module into several existing models, namely, IPT, SwinIR, and HAT, showing that the resulting models can not only maintain their original performance on fixed scale factor but also extrapolate to unseen scales, substantially outperforming existing any-scale super-resolution models on standard benchmarks, e.g. Urban100, DIV2K, etc.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"25 1","pages":"107"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88208443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos 基于视频的人体姿态和形状估计的时空趋势推理

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03659

Boyang Zhang, Suping Wu, Hu Cao, Kehua Ma, Pan Li, Lei Lin

{"title":"Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos","authors":"Boyang Zhang, Suping Wu, Hu Cao, Kehua Ma, Pan Li, Lei Lin","doi":"10.48550/arXiv.2210.03659","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03659","url":null,"abstract":"In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences' tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"3 1","pages":"719"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83940654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Instance Segmentation of Dense and Overlapping Objects via Layering 基于分层的密集和重叠对象的实例分割

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03551

Long Chen, Yuli Wu, D. Merhof

引用次数: 2

Humans need not label more humans: Occlusion Copy & Paste for Occluded Human Instance Segmentation 人类不需要标记更多的人类:遮挡复制和粘贴遮挡的人类实例分割

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03686

Evan Ling, De-Kai Huang, Minhoe Hur

{"title":"Humans need not label more humans: Occlusion Copy & Paste for Occluded Human Instance Segmentation","authors":"Evan Ling, De-Kai Huang, Minhoe Hur","doi":"10.48550/arXiv.2210.03686","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03686","url":null,"abstract":"Modern object detection and instance segmentation networks stumble when picking out humans in crowded or highly occluded scenes. Yet, these are often scenarios where we require our detectors to work well. Many works have approached this problem with model-centric improvements. While they have been shown to work to some extent, these supervised methods still need sufficient relevant examples (i.e. occluded humans) during training for the improvements to be maximised. In our work, we propose a simple yet effective data-centric approach, Occlusion Copy&Paste, to introduce occluded examples to models during training - we tailor the general copy&paste augmentation approach to tackle the difficult problem of same-class occlusion. It improves instance segmentation performance on occluded scenarios for\"free\"just by leveraging on existing large-scale datasets, without additional data or manual labelling needed. In a principled study, we show whether various proposed add-ons to the copy&paste augmentation indeed contribute to better performance. Our Occlusion Copy&Paste augmentation is easily interoperable with any models: by simply applying it to a recent generic instance segmentation model without explicit model architectural design to tackle occlusion, we achieve state-of-the-art instance segmentation performance on the very challenging OCHuman dataset. Source code is available at https://github.com/levan92/occlusion-copy-paste.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"122 1","pages":"329"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74684521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1