BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference最新文献_第7页

Correlation between Alignment-Uniformity and Performance of Dense Contrastive Representations 密集对比表示的对齐均匀性与性能之间的关系

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.08819

J. Moon, Wonjae Kim, E. Choi

{"title":"Correlation between Alignment-Uniformity and Performance of Dense Contrastive Representations","authors":"J. Moon, Wonjae Kim, E. Choi","doi":"10.48550/arXiv.2210.08819","DOIUrl":"https://doi.org/10.48550/arXiv.2210.08819","url":null,"abstract":"Recently, dense contrastive learning has shown superior performance on dense prediction tasks compared to instance-level contrastive learning. Despite its supremacy, the properties of dense contrastive representations have not yet been carefully studied. Therefore, we analyze the theoretical ideas of dense contrastive learning using a standard CNN and straightforward feature matching scheme rather than propose a new complex method. Inspired by the analysis of the properties of instance-level contrastive representations through the lens of alignment and uniformity on the hypersphere, we employ and extend the same lens for the dense contrastive representations to analyze their underexplored properties. We discover the core principle in constructing a positive pair of dense features and empirically proved its validity. Also, we introduces a new scalar metric that summarizes the correlation between alignment-and-uniformity and downstream performance. Using this metric, we study various facets of densely learned contrastive representations such as how the correlation changes over single- and multi-object datasets or linear evaluation and dense prediction tasks. The source code is publicly available at: https://github.com/SuperSupermoon/DenseCL-analysis","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"1 1","pages":"844"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82891251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Track Targets by Dense Spatio-Temporal Position Encoding 基于密集时空位置编码的目标跟踪

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.09455

Jinkun Cao, Hao Wu, Kris Kitani

{"title":"Track Targets by Dense Spatio-Temporal Position Encoding","authors":"Jinkun Cao, Hao Wu, Kris Kitani","doi":"10.48550/arXiv.2210.09455","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09455","url":null,"abstract":"In this work, we propose a novel paradigm to encode the position of targets for target tracking in videos using transformers. The proposed paradigm, Dense Spatio-Temporal (DST) position encoding, encodes spatio-temporal position information in a pixel-wise dense fashion. The provided position encoding provides location information to associate targets across frames beyond appearance matching by comparing objects in two bounding boxes. Compared to the typical transformer positional encoding, our proposed encoding is applied to the 2D CNN features instead of the projected feature vectors to avoid losing positional information. Moreover, the designed DST encoding can represent the location of a single-frame object and the evolution of the location of the trajectory among frames uniformly. Integrated with the DST encoding, we build a transformer-based multi-object tracking model. The model takes a video clip as input and conducts the target association in the clip. It can also perform online inference by associating existing trajectories with objects from the new-coming frames. Experiments on video multi-object tracking (MOT) and multi-object tracking and segmentation (MOTS) datasets demonstrate the effectiveness of the proposed DST position encoding.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"8 1","pages":"311"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87709190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Approximating Continuous Convolutions for Deep Network Compression 深度网络压缩的连续卷积逼近

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.08951

Theo W. Costain, V. Prisacariu

引用次数: 0

Semantic Segmentation with Active Semi-Supervised Representation Learning 基于主动半监督表示学习的语义分割

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-16 DOI: 10.48550/arXiv.2210.08403

Aneesh Rangnekar, Christopher Kanan, Matthew Hoffman

{"title":"Semantic Segmentation with Active Semi-Supervised Representation Learning","authors":"Aneesh Rangnekar, Christopher Kanan, Matthew Hoffman","doi":"10.48550/arXiv.2210.08403","DOIUrl":"https://doi.org/10.48550/arXiv.2210.08403","url":null,"abstract":"Obtaining human per-pixel labels for semantic segmentation is incredibly laborious, often making labeled dataset construction prohibitively expensive. Here, we endeavor to overcome this problem with a novel algorithm that combines semi-supervised and active learning, resulting in the ability to train an effective semantic segmentation algorithm with significantly lesser labeled data. To do this, we extend the prior state-of-the-art S4AL algorithm by replacing its mean teacher approach for semi-supervised learning with a self-training approach that improves learning with noisy labels. We further boost the neural network's ability to query useful data by adding a contrastive learning head, which leads to better understanding of the objects in the scene, and hence, better queries for active learning. We evaluate our method on CamVid and CityScapes datasets, the de-facto standards for active learning for semantic segmentation. We achieve more than 95% of the network's performance on CamVid and CityScapes datasets, utilizing only 12.1% and 15.1% of the labeled data, respectively. We also benchmark our method across existing stand-alone semi-supervised learning methods on the CityScapes dataset and achieve superior performance without any bells or whistles.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"13 1","pages":"229"},"PeriodicalIF":0.0,"publicationDate":"2022-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89363006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Improving SLAM in Dynamic Environments: Learning When to Mask 动态环境中自我改进的SLAM:学习何时屏蔽

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-15 DOI: 10.48550/arXiv.2210.08350

Adrian Bojko, R. Dupont, M. Tamaazousti, H. Borgne

引用次数: 1

Wide Range MRI Artifact Removal with Transformers 变压器的大范围MRI伪影去除

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-14 DOI: 10.48550/arXiv.2210.07976

Lennart Alexander Van der Goten, Kevin Smith

{"title":"Wide Range MRI Artifact Removal with Transformers","authors":"Lennart Alexander Van der Goten, Kevin Smith","doi":"10.48550/arXiv.2210.07976","DOIUrl":"https://doi.org/10.48550/arXiv.2210.07976","url":null,"abstract":"Artifacts on magnetic resonance scans are a serious challenge for both radiologists and computer-aided diagnosis systems. Most commonly, artifacts are caused by motion of the patients, but can also arise from device-specific abnormalities such as noise patterns. Irrespective of the source, artifacts can not only render a scan useless, but can potentially induce misdiagnoses if left unnoticed. For instance, an artifact may masquerade as a tumor or other abnormality. Retrospective artifact correction (RAC) is concerned with removing artifacts after the scan has already been taken. In this work, we propose a method capable of retrospectively removing eight common artifacts found in native-resolution MR imagery. Knowledge of the presence or location of a specific artifact is not assumed and the system is, by design, capable of undoing interactions of multiple artifacts. Our method is realized through the design of a novel volumetric transformer-based neural network that generalizes a emph{window-centered} approach popularized by the Swin transformer. Unlike Swin, our method is (i) natively volumetric, (ii) geared towards dense prediction tasks instead of classification, and (iii), uses a novel and more global mechanism to enable information exchange between windows. Our experiments show that our reconstructions are considerably better than those attained by ResNet, V-Net, MobileNet-v2, DenseNet, CycleGAN and BicycleGAN. Moreover, we show that the reconstructed images from our model improves the accuracy of FSL BET, a standard skull-stripping method typically applied in diagnostic workflows.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"16 1","pages":"846"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75538536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Polycentric Clustering and Structural Regularization for Source-free Unsupervised Domain Adaptation 无源无监督域自适应的多中心聚类和结构正则化

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-14 DOI: 10.48550/arXiv.2210.07463

Xinyu Guan, Han Sun, Ningzhong Liu, Huiyu Zhou

{"title":"Polycentric Clustering and Structural Regularization for Source-free Unsupervised Domain Adaptation","authors":"Xinyu Guan, Han Sun, Ningzhong Liu, Huiyu Zhou","doi":"10.48550/arXiv.2210.07463","DOIUrl":"https://doi.org/10.48550/arXiv.2210.07463","url":null,"abstract":"Source-Free Domain Adaptation (SFDA) aims to solve the domain adaptation problem by transferring the knowledge learned from a pre-trained source model to an unseen target domain. Most existing methods assign pseudo-labels to the target data by generating feature prototypes. However, due to the discrepancy in the data distribution between the source domain and the target domain and category imbalance in the target domain, there are severe class biases in the generated feature prototypes and noisy pseudo-labels. Besides, the data structure of the target domain is often ignored, which is crucial for clustering. In this paper, a novel framework named PCSR is proposed to tackle SFDA via a novel intra-class Polycentric Clustering and Structural Regularization strategy. Firstly, an inter-class balanced sampling strategy is proposed to generate representative feature prototypes for each class. Furthermore, k-means clustering is introduced to generate multiple clustering centers for each class in the target domain to obtain robust pseudo-labels. Finally, to enhance the model's generalization, structural regularization is introduced for the target domain. Extensive experiments on three UDA benchmark datasets show that our method performs better or similarly against the other state of the art methods, demonstrating our approach's superiority for visual domain adaptation problems.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"46 1","pages":"485"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79649190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

How to Train Vision Transformer on Small-scale Datasets? 如何在小尺度数据集上训练视觉转换器?

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-13 DOI: 10.48550/arXiv.2210.07240

Hanan Gani, Muzammal Naseer, Mohammad Yaqub

{"title":"How to Train Vision Transformer on Small-scale Datasets?","authors":"Hanan Gani, Muzammal Naseer, Mohammad Yaqub","doi":"10.48550/arXiv.2210.07240","DOIUrl":"https://doi.org/10.48550/arXiv.2210.07240","url":null,"abstract":"Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases. Therefore, successful training of such models is mainly attributed to pre-training on large-scale datasets such as ImageNet with 1.2M or JFT with 300M images. This hinders the direct adaption of Vision Transformer for small-scale datasets. In this work, we show that self-supervised inductive biases can be learned directly from small-scale datasets and serve as an effective weight initialization scheme for fine-tuning. This allows to train these models without large-scale pre-training, changes to model architecture or loss functions. We present thorough experiments to successfully train monolithic and non-monolithic Vision Transformers on five small datasets including CIFAR10/100, CINIC10, SVHN, Tiny-ImageNet and two fine-grained datasets: Aircraft and Cars. Our approach consistently improves the performance of Vision Transformers while retaining their properties such as attention to salient regions and higher robustness. Our codes and pre-trained models are available at: https://github.com/hananshafi/vits-for-small-scale-datasets.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"20 1","pages":"731"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90484976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Shape Preserving Facial Landmarks with Graph Attention Networks 基于图形注意网络的形状保持面部特征

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-13 DOI: 10.48550/arXiv.2210.07233

Andr'es Prados-Torreblanca, J. M. Buenaposada, L. Baumela

引用次数: 4

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors 空间和时间的稀疏:具有可训练选择器的视听同步

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-13 DOI: 10.48550/arXiv.2210.07055

Vladimir E. Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

{"title":"Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors","authors":"Vladimir E. Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman","doi":"10.48550/arXiv.2210.07055","DOIUrl":"https://doi.org/10.48550/arXiv.2210.07055","url":null,"abstract":"The objective of this paper is audio-visual synchronisation of general videos 'in the wild'. For such videos, the events that may be harnessed for synchronisation cues may be spatially small and may occur only infrequently during a many seconds-long video clip, i.e. the synchronisation signal is 'sparse in space and time'. This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space. We make four contributions: (i) in order to handle longer temporal sequences required for sparse synchronisation signals, we design a multi-modal transformer model that employs 'selectors' to distil the long audio and visual streams into small sequences that are then used to predict the temporal offset between streams. (ii) We identify artefacts that can arise from the compression codecs used for audio and video and can be used by audio-visual models in training to artificially solve the synchronisation task. (iii) We curate a dataset with only sparse in time and space synchronisation signals; and (iv) the effectiveness of the proposed model is shown on both dense and sparse datasets quantitatively and qualitatively. Project page: v-iashin.github.io/SparseSync","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"33 1","pages":"395"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90704398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6