BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference最新文献_第8页

VL4Pose: Active Learning Through Out-Of-Distribution Detection For Pose Estimation VL4Pose:基于分布外检测的姿态估计主动学习

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06028

Megh Shukla, Roshan Roy, Pankaj Singh, Shuaib Ahmed, Alexandre Alahi

{"title":"VL4Pose: Active Learning Through Out-Of-Distribution Detection For Pose Estimation","authors":"Megh Shukla, Roshan Roy, Pankaj Singh, Shuaib Ahmed, Alexandre Alahi","doi":"10.48550/arXiv.2210.06028","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06028","url":null,"abstract":"Advances in computing have enabled widespread access to pose estimation, creating new sources of data streams. Unlike mock set-ups for data collection, tapping into these data streams through on-device active learning allows us to directly sample from the real world to improve the spread of the training distribution. However, on-device computing power is limited, implying that any candidate active learning algorithm should have a low compute footprint while also being reliable. Although multiple algorithms cater to pose estimation, they either use extensive compute to power state-of-the-art results or are not competitive in low-resource settings. We address this limitation with VL4Pose (Visual Likelihood For Pose Estimation), a first principles approach for active learning through out-of-distribution detection. We begin with a simple premise: pose estimators often predict incoherent poses for out-of-distribution samples. Hence, can we identify a distribution of poses the model has been trained on, to identify incoherent poses the model is unsure of? Our solution involves modelling the pose through a simple parametric Bayesian network trained via maximum likelihood estimation. Therefore, poses incurring a low likelihood within our framework are out-of-distribution samples making them suitable candidates for annotation. We also observe two useful side-outcomes: VL4Pose in-principle yields better uncertainty estimates by unifying joint and pose level ambiguity, as well as the unintentional but welcome ability of VL4Pose to perform pose refinement in limited scenarios. We perform qualitative and quantitative experiments on three datasets: MPII, LSP and ICVL, spanning human and hand pose estimation. Finally, we note that VL4Pose is simple, computationally inexpensive and competitive, making it suitable for challenging tasks such as on-device active learning.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"6 1","pages":"610"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87436115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Robust Action Segmentation from Timestamp Supervision 基于时间戳监督的鲁棒动作分割

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06501

Yaser Souri, Yazan Abu Farha, Emad Bahrami, G. Francesca, Juergen Gall

引用次数: 3

AISFormer: Amodal Instance Segmentation with Transformer AISFormer:带变压器的模态实例分割

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06323

Minh-Triet Tran, Khoa T. Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan T. H. Le

{"title":"AISFormer: Amodal Instance Segmentation with Transformer","authors":"Minh-Triet Tran, Khoa T. Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd, Ngan T. H. Le","doi":"10.48550/arXiv.2210.06323","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06323","url":null,"abstract":"Amodal Instance Segmentation (AIS) aims to segment the region of both visible and possible occluded parts of an object instance. While Mask R-CNN-based AIS approaches have shown promising results, they are unable to model high-level features coherence due to the limited receptive field. The most recent transformer-based models show impressive performance on vision tasks, even better than Convolution Neural Networks (CNN). In this work, we present AISFormer, an AIS framework, with a Transformer-based mask head. AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries. Specifically, AISFormer contains four modules: (i) feature encoding: extract ROI and learn both short-range and long-range visual features. (ii) mask transformer decoding: generate the occluder, visible, and amodal mask query embeddings by a transformer decoder (iii) invisible mask embedding: model the coherence between the amodal and visible masks, and (iv) mask predicting: estimate output masks including occluder, visible, amodal and invisible. We conduct extensive experiments and ablation studies on three challenging benchmarks i.e. KINS, D2SA, and COCOA-cls to evaluate the effectiveness of AISFormer. The code is available at: https://github.com/UARK-AICV/AISFormer","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"27 1","pages":"712"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74227943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Semantic Segmentation under Adverse Conditions: A Weather and Nighttime-aware Synthetic Data-based Approach 不利条件下的语义分割:一种基于天气和夜间感知的综合数据方法

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05626

Abdulrahman Kerim, Felipe C. Chamone, W. Ramos, L. Marcolino, Erickson R. Nascimento, Richard M. Jiang

{"title":"Semantic Segmentation under Adverse Conditions: A Weather and Nighttime-aware Synthetic Data-based Approach","authors":"Abdulrahman Kerim, Felipe C. Chamone, W. Ramos, L. Marcolino, Erickson R. Nascimento, Richard M. Jiang","doi":"10.48550/arXiv.2210.05626","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05626","url":null,"abstract":"Recent semantic segmentation models perform well under standard weather conditions and sufficient illumination but struggle with adverse weather conditions and nighttime. Collecting and annotating training data under these conditions is expensive, time-consuming, error-prone, and not always practical. Usually, synthetic data is used as a feasible data source to increase the amount of training data. However, just directly using synthetic data may actually harm the model's performance under normal weather conditions while getting only small gains in adverse situations. Therefore, we present a novel architecture specifically designed for using synthetic training data for domain adaptation. We propose a simple yet powerful addition to DeepLabV3+ by using weather and time-of-the-day supervisors trained with multi-task learning, making it both weather and nighttime aware, which improves its mIoU accuracy by $14$ percentage points on the ACDC dataset while maintaining a score of $75%$ mIoU on the Cityscapes dataset. Our code is available at https://github.com/lsmcolab/Semantic-Segmentation-under-Adverse-Conditions.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"22 1","pages":"977"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74191902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

AMICO: Amodal Instance Composition AMICO:模态实例合成

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05828

Peiye Zhuang, D. Demandolx, Ayush Saraf, Xuejian Rong, Changil Kim, Jia-Bin Huang

引用次数: 0

TetGAN: A Convolutional Neural Network for Tetrahedral Mesh Generation 四面体网格生成的卷积神经网络

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05735

William Gao, April Wang, G. Metzer, Raymond A. Yeh, R. Hanocka

引用次数: 8

Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition 无源跨域面部表情识别的聚类级伪标记

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05246

Alessandro Conti, P. Rota, Yiming Wang, E. Ricci

{"title":"Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition","authors":"Alessandro Conti, P. Rota, Yiming Wang, E. Ricci","doi":"10.48550/arXiv.2210.05246","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05246","url":null,"abstract":"Automatically understanding emotions from visual data is a fundamental task for human behaviour understanding. While models devised for Facial Expression Recognition (FER) have demonstrated excellent performances on many datasets, they often suffer from severe performance degradation when trained and tested on different datasets due to domain shift. In addition, as face images are considered highly sensitive data, the accessibility to large-scale datasets for model training is often denied. In this work, we tackle the above-mentioned problems by proposing the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for FER. Our method exploits self-supervised pretraining to learn good feature representations from the target data and proposes a novel and robust cluster-level pseudo-labelling strategy that accounts for in-cluster statistics. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER, and is on par with methods addressing FER in the UDA setting.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"26 1","pages":"486"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85292607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances HiFECap:人类表演的单目高保真和表达捕捉

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05665

Yue-Ren Jiang, Marc Habermann, Vladislav Golyanik, C. Theobalt

{"title":"HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances","authors":"Yue-Ren Jiang, Marc Habermann, Vladislav Golyanik, C. Theobalt","doi":"10.48550/arXiv.2210.05665","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05665","url":null,"abstract":"Monocular 3D human performance capture is indispensable for many applications in computer graphics and vision for enabling immersive experiences. However, detailed capture of humans requires tracking of multiple aspects, including the skeletal pose, the dynamic surface, which includes clothing, hand gestures as well as facial expressions. No existing monocular method allows joint tracking of all these components. To this end, we propose HiFECap, a new neural human performance capture approach, which simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video. We demonstrate that our proposed network architecture, the carefully designed training strategy, and the tight integration of parametric face and hand models to a template mesh enable the capture of all these individual aspects. Importantly, our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works. Furthermore, we show that HiFECap outperforms the state-of-the-art human performance capture approaches qualitatively and quantitatively while for the first time capturing all aspects of the human.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"70 1","pages":"826"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82285618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

APSNet: Attention Based Point Cloud Sampling APSNet:基于注意力的点云采样

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05638

Yang Ye, Xiulong Yang, Shihao Ji

{"title":"APSNet: Attention Based Point Cloud Sampling","authors":"Yang Ye, Xiulong Yang, Shihao Ji","doi":"10.48550/arXiv.2210.05638","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05638","url":null,"abstract":"Processing large point clouds is a challenging task. Therefore, the data is often downsampled to a smaller size such that it can be stored, transmitted and processed more efficiently without incurring significant performance degradation. Traditional task-agnostic sampling methods, such as farthest point sampling (FPS), do not consider downstream tasks when sampling point clouds, and thus non-informative points to the tasks are often sampled. This paper explores a task-oriented sampling for 3D point clouds, and aims to sample a subset of points that are tailored specifically to a downstream task of interest. Similar to FPS, we assume that point to be sampled next should depend heavily on the points that have already been sampled. We thus formulate point cloud sampling as a sequential generation process, and develop an attention-based point cloud sampling network (APSNet) to tackle this problem. At each time step, APSNet attends to all the points in a cloud by utilizing the history of previously sampled points, and samples the most informative one. Both supervised learning and knowledge distillation-based self-supervised learning of APSNet are proposed. Moreover, joint training of APSNet over multiple sample sizes is investigated, leading to a single APSNet that can generate arbitrary length of samples with prominent performances. Extensive experiments demonstrate the superior performance of APSNet against state-of-the-arts in various downstream tasks, including 3D point cloud classification, reconstruction, and registration.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"71 1","pages":"298"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81804963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Boosting Adversarial Robustness From The Perspective of Effective Margin Regularization 从有效边际正则化的角度增强对抗鲁棒性

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05118

Ziquan Liu, Antoni B. Chan

{"title":"Boosting Adversarial Robustness From The Perspective of Effective Margin Regularization","authors":"Ziquan Liu, Antoni B. Chan","doi":"10.48550/arXiv.2210.05118","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05118","url":null,"abstract":"The adversarial vulnerability of deep neural networks (DNNs) has been actively investigated in the past several years. This paper investigates the scale-variant property of cross-entropy loss, which is the most commonly used loss function in classification tasks, and its impact on the effective margin and adversarial robustness of deep neural networks. Since the loss function is not invariant to logit scaling, increasing the effective weight norm will make the loss approach zero and its gradient vanish while the effective margin is not adequately maximized. On typical DNNs, we demonstrate that, if not properly regularized, the standard training does not learn large effective margins and leads to adversarial vulnerability. To maximize the effective margins and learn a robust DNN, we propose to regularize the effective weight norm during training. Our empirical study on feedforward DNNs demonstrates that the proposed effective margin regularization (EMR) learns large effective margins and boosts the adversarial robustness in both standard and adversarial training. On large-scale models, we show that EMR outperforms basic adversarial training, TRADES and two regularization baselines with substantial improvement. Moreover, when combined with several strong adversarial defense methods (MART and MAIL), our EMR further boosts the robustness.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"77 1","pages":"367"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83753598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2