BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference最新文献_第9页

An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition 一个动作值多个词:动作识别中的歧义处理

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04933

Kiyoon Kim, D. Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara

{"title":"An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition","authors":"Kiyoon Kim, D. Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara","doi":"10.48550/arXiv.2210.04933","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04933","url":null,"abstract":"Precisely naming the action depicted in a video can be a challenging and oftentimes ambiguous task. In contrast to object instances represented as nouns (e.g. dog, cat, chair, etc.), in the case of actions, human annotators typically lack a consensus as to what constitutes a specific action (e.g. jogging versus running). In practice, a given video can contain multiple valid positive annotations for the same action. As a result, video datasets often contain significant levels of label noise and overlap between the atomic action classes. In this work, we address the challenge of training multi-label action recognition models from only single positive training labels. We propose two approaches that are based on generating pseudo training examples sampled from similar instances within the train set. Unlike other approaches that use model-derived pseudo-labels, our pseudo-labels come from human annotations and are selected based on feature similarity. To validate our approaches, we create a new evaluation benchmark by manually annotating a subset of EPIC-Kitchens-100's validation set with multiple verb labels. We present results on this new test set along with additional results on a new version of HMDB-51, called Confusing-HMDB-102, where we outperform existing methods in both cases. Data and code are available at https://github.com/kiyoon/verb_ambiguity","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"24 1","pages":"356"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79953651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised Domain Adaptive Fundus Image Segmentation with Few Labeled Source Data 少量标记源数据的无监督域自适应眼底图像分割

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04379

Qian Yu, Dongnan Liu, Chaoyi Zhang, Xinwen Zhang, Weidong (Tom) Cai

{"title":"Unsupervised Domain Adaptive Fundus Image Segmentation with Few Labeled Source Data","authors":"Qian Yu, Dongnan Liu, Chaoyi Zhang, Xinwen Zhang, Weidong (Tom) Cai","doi":"10.48550/arXiv.2210.04379","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04379","url":null,"abstract":"Deep learning-based segmentation methods have been widely employed for automatic glaucoma diagnosis and prognosis. In practice, fundus images obtained by different fundus cameras vary significantly in terms of illumination and intensity. Although recent unsupervised domain adaptation (UDA) methods enhance the models' generalization ability on the unlabeled target fundus datasets, they always require sufficient labeled data from the source domain, bringing auxiliary data acquisition and annotation costs. To further facilitate the data efficiency of the cross-domain segmentation methods on the fundus images, we explore UDA optic disc and cup segmentation problems using few labeled source data in this work. We first design a Searching-based Multi-style Invariant Mechanism to diversify the source data style as well as increase the data amount. Next, a prototype consistency mechanism on the foreground objects is proposed to facilitate the feature alignment for each kind of tissue under different image styles. Moreover, a cross-style self-supervised learning stage is further designed to improve the segmentation performance on the target images. Our method has outperformed several state-of-the-art UDA segmentation methods under the UDA fundus segmentation with few labeled source data.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"70 1","pages":"237"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85099680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SiNeRF: Sinusoidal Neural Radiance Fields for Joint Pose Estimation and Scene Reconstruction SiNeRF:用于关节姿态估计和场景重建的正弦神经辐射场

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04553

Yitong Xia, Hao Tang, R. Timofte, L. Gool

引用次数: 17

Scale Equivariant U-Net 尺度等变U-Net

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04508

Mateus Sangalli, Samy Blusseau, S. Velasco-Forero, J. Angulo

{"title":"Scale Equivariant U-Net","authors":"Mateus Sangalli, Samy Blusseau, S. Velasco-Forero, J. Angulo","doi":"10.48550/arXiv.2210.04508","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04508","url":null,"abstract":"In neural networks, the property of being equivariant to transformations improves generalization when the corresponding symmetry is present in the data. In particular, scale-equivariant networks are suited to computer vision tasks where the same classes of objects appear at different scales, like in most semantic segmentation tasks. Recently, convolutional layers equivariant to a semigroup of scalings and translations have been proposed. However, the equivariance of subsampling and upsampling has never been explicitly studied even though they are necessary building blocks in some segmentation architectures. The U-Net is a representative example of such architectures, which includes the basic elements used for state-of-the-art semantic segmentation. Therefore, this paper introduces the Scale Equivariant U-Net (SEU-Net), a U-Net that is made approximately equivariant to a semigroup of scales and translations through careful application of subsampling and upsampling layers and the use of aforementioned scale-equivariant layers. Moreover, a scale-dropout is proposed in order to improve generalization to different scales in approximately scale-equivariant architectures. The proposed SEU-Net is trained for semantic segmentation of the Oxford Pet IIIT and the DIC-C2DH-HeLa dataset for cell segmentation. The generalization metric to unseen scales is dramatically improved in comparison to the U-Net, even when the U-Net is trained with scale jittering, and to a scale-equivariant architecture that does not perform upsampling operators inside the equivariant pipeline. The scale-dropout induces better generalization on the scale-equivariant models in the Pet experiment, but not on the cell segmentation experiment.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"101 1","pages":"763"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77329213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Memory Transformer Network for Incremental Learning 增量学习的记忆变压器网络

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04485

Ahmet Iscen, Thomas Bird, Mathilde Caron, A. Fathi, C. Schmid

{"title":"A Memory Transformer Network for Incremental Learning","authors":"Ahmet Iscen, Thomas Bird, Mathilde Caron, A. Fathi, C. Schmid","doi":"10.48550/arXiv.2210.04485","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04485","url":null,"abstract":"We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from. Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the\"catastrophic forgetting\"of previously seen classes. One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks. In our paper, we propose to enhance the utilization of this memory bank: we not only use it as a source of additional training data like existing works but also integrate it in the prediction process explicitly.Our method, the Memory Transformer Network (MTN), learns how to combine and aggregate the information from the nearest neighbors in the memory with a transformer to make more accurate predictions. We conduct extensive experiments and ablations to evaluate our approach. We show that MTN achieves state-of-the-art performance on the challenging ImageNet-1k and Google-Landmarks-1k incremental learning benchmarks.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"49 1","pages":"388"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84964253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Turbo Training with Token Dropout 涡轮训练与令牌辍学

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04889

Tengda Han, Weidi Xie, Andrew Zisserman

引用次数: 2

Less is More: Facial Landmarks can Recognize a Spontaneous Smile 少即是多:面部标志可以识别一个自发的微笑

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-09 DOI: 10.48550/arXiv.2210.04240

Md. Tahrim Faroque, Yan Yang, Md. Zakir Hossain, S. M. Naim, Nabeel Mohammed, Shafin Rahman

{"title":"Less is More: Facial Landmarks can Recognize a Spontaneous Smile","authors":"Md. Tahrim Faroque, Yan Yang, Md. Zakir Hossain, S. M. Naim, Nabeel Mohammed, Shafin Rahman","doi":"10.48550/arXiv.2210.04240","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04240","url":null,"abstract":"Smile veracity classification is a task of interpreting social interactions. Broadly, it distinguishes between spontaneous and posed smiles. Previous approaches used hand-engineered features from facial landmarks or considered raw smile videos in an end-to-end manner to perform smile classification tasks. Feature-based methods require intervention from human experts on feature engineering and heavy pre-processing steps. On the contrary, raw smile video inputs fed into end-to-end models bring more automation to the process with the cost of considering many redundant facial features (beyond landmark locations) that are mainly irrelevant to smile veracity classification. It remains unclear to establish discriminative features from landmarks in an end-to-end manner. We present a MeshSmileNet framework, a transformer architecture, to address the above limitations. To eliminate redundant facial features, our landmarks input is extracted from Attention Mesh, a pre-trained landmark detector. Again, to discover discriminative features, we consider the relativity and trajectory of the landmarks. For the relativity, we aggregate facial landmark that conceptually formats a curve at each frame to establish local spatial features. For the trajectory, we estimate the movements of landmark composed features across time by self-attention mechanism, which captures pairwise dependency on the trajectory of the same landmark. This idea allows us to achieve state-of-the-art performances on UVA-NEMO, BBC, MMI Facial Expression, and SPOS datasets.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"37 1","pages":"369"},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90964482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Robustifying the Multi-Scale Representation of Neural Radiance Fields 神经辐射场的多尺度鲁棒化

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-09 DOI: 10.48550/arXiv.2210.04233

Nishant Jain, Suryansh Kumar, L. Gool

{"title":"Robustifying the Multi-Scale Representation of Neural Radiance Fields","authors":"Nishant Jain, Suryansh Kumar, L. Gool","doi":"10.48550/arXiv.2210.04233","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04233","url":null,"abstract":"Neural Radiance Fields (NeRF) recently emerged as a new paradigm for object representation from multi-view (MV) images. Yet, it cannot handle multi-scale (MS) images and camera pose estimation errors, which generally is the case with multi-view images captured from a day-to-day commodity camera. Although recently proposed Mip-NeRF could handle multi-scale imaging problems with NeRF, it cannot handle camera pose estimation error. On the other hand, the newly proposed BARF can solve the camera pose problem with NeRF but fails if the images are multi-scale in nature. This paper presents a robust multi-scale neural radiance fields representation approach to simultaneously overcome both real-world imaging issues. Our method handles multi-scale imaging effects and camera-pose estimation problems with NeRF-inspired approaches by leveraging the fundamentals of scene rigidity. To reduce unpleasant aliasing artifacts due to multi-scale images in the ray space, we leverage Mip-NeRF multi-scale representation. For joint estimation of robust camera pose, we propose graph-neural network-based multiple motion averaging in the neural volume rendering framework. We demonstrate, with examples, that for an accurate neural representation of an object from day-to-day acquired multi-view images, it is crucial to have precise camera-pose estimates. Without considering robustness measures in the camera pose estimation, modeling for multi-scale aliasing artifacts via conical frustum can be counterproductive. We present extensive experiments on the benchmark datasets to demonstrate that our approach provides better results than the recent NeRF-inspired approaches for such realistic settings.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"164 1","pages":"578"},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86390531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Towards Efficient Neural Scene Graphs by Learning Consistency Fields 基于一致性域学习的高效神经场景图

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-09 DOI: 10.48550/arXiv.2210.04127

Yeji Song, Chaerin Kong, Seoyoung Lee, N. Kwak, Joonseok Lee

{"title":"Towards Efficient Neural Scene Graphs by Learning Consistency Fields","authors":"Yeji Song, Chaerin Kong, Seoyoung Lee, N. Kwak, Joonseok Lee","doi":"10.48550/arXiv.2210.04127","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04127","url":null,"abstract":"Neural Radiance Fields (NeRF) achieves photo-realistic image rendering from novel views, and the Neural Scene Graphs (NSG) cite{ost2021neural} extends it to dynamic scenes (video) with multiple objects. Nevertheless, computationally heavy ray marching for every image frame becomes a huge burden. In this paper, taking advantage of significant redundancy across adjacent frames in videos, we propose a feature-reusing framework. From the first try of naively reusing the NSG features, however, we learn that it is crucial to disentangle object-intrinsic properties consistent across frames from transient ones. Our proposed method, textit{Consistency-Field-based NSG (CF-NSG)}, reformulates neural radiance fields to additionally consider textit{consistency fields}. With disentangled representations, CF-NSG takes full advantage of the feature-reusing scheme and performs an extended degree of scene manipulation in a more controllable manner. We empirically verify that CF-NSG greatly improves the inference efficiency by using 85% less queries than NSG without notable degradation in rendering quality. Code will be available at: https://github.com/ldynx/CF-NSG","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"37 1 1","pages":"302"},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88203843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Low Light Video Enhancement by Learning on Static Videos with Cross-Frame Attention 通过学习具有交叉帧注意力的静态视频来增强弱光视频

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-09 DOI: 10.48550/arXiv.2210.04290

Shivam Chhirolya, Sameer Malik, R. Soundararajan

引用次数: 1