Amélie Gruel, Jean Martinet, B. Linares-Barranco, T. Serrano-Gotarredona
{"title":"Performance comparison of DVS data spatial downscaling methods using Spiking Neural Networks","authors":"Amélie Gruel, Jean Martinet, B. Linares-Barranco, T. Serrano-Gotarredona","doi":"10.1109/WACV56688.2023.00643","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00643","url":null,"abstract":"Dynamic Vision Sensors (DVS) are an unconventional type of camera that produces sparse and asynchronous event data, which has recently led to a strong increase in its use for computer vision tasks namely in robotics. Embedded systems face limitations in terms of energy resources, memory, computational power, and communication bandwidth. Hence, this application calls for a way to reduce the amount of data to be processed while keeping the relevant information for the task at hand. We thus believe that a formal definition of event data reduction methods will provide a step further towards sparse data processing.The contributions of this paper are twofold: we introduce two complementary neuromorphic methods based on Spiking Neural Networks for DVS data spatial reduction, which is to best of our knowledge the first proposal of neuromorphic event data reduction; then we study for each method the trade-off between the amount of information kept after reduction, the performance of gesture classification after reduction and their capacity to handle events in real time. We demonstrate here that the proposed SNN-based methods outperform existing methods in a classification task for most dividing factors and are significantly better at handling data in real time, and make therefore the optimal choice for fully-integrated energy-efficient event data reduction running dynamically on a neuromorphic platform. Our code is publicly available online at: https://github.com/amygruel/EvVisu.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127726833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empirical Generalization Study: Unsupervised Domain Adaptation vs. Domain Generalization Methods for Semantic Segmentation in the Wild","authors":"Fabrizio J. Piva, Daan de Geus, Gijs Dubbelman","doi":"10.1109/WACV56688.2023.00057","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00057","url":null,"abstract":"For autonomous vehicles and mobile robots to safely operate in the real world, i.e., the wild, scene understanding models should perform well in the many different scenarios that can be encountered. In reality, these scenarios are not all represented in the model’s training data, leading to poor performance. To tackle this, current training strategies attempt to either exploit additional unlabeled data with unsupervised domain adaptation (UDA), or to reduce overfitting using the limited available labeled data with domain generalization (DG). However, it is not clear from current literature which of these methods allows for better generalization to unseen data from the wild. Therefore, in this work, we present an evaluation framework in which the generalization capabilities of state-of-the-art UDA and DG methods can be compared fairly. From this evaluation, we find that UDA methods, which leverage unlabeled data, outperform DG methods in terms of generalization, and can deliver similar performance on unseen data as fully-supervised training methods that require all data to be labeled. We show that semantic segmentation performance can be increased up to 30% for a priori unknown data without using any extra labeled data.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"7 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132340124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representation Recovering for Self-Supervised Pre-training on Medical Images","authors":"Xiangyi Yan, Junayed Naushad, Shanlin Sun, Kun Han, Hao Tang, Deying Kong, Haoyu Ma, Chenyu You, Xiaohui Xie","doi":"10.1109/WACV56688.2023.00271","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00271","url":null,"abstract":"Advances in self-supervised learning have drawn attention to developing techniques to extract effective visual representations from unlabeled images. Contrastive learning (CL) trains a model to extract consistent features by generating different views. Recent success of Masked Autoencoders (MAE) highlights the benefit of generative modeling in self-supervised learning. The generative approaches encode the input into a compact embedding and empower the model’s ability of recovering the original input. However, in our experiments, we found vanilla MAE mainly recovers coarse high level semantic information and is inadequate in recovering detailed low level information. We show that in dense downstream prediction tasks like multi-organ segmentation, directly applying MAE is not ideal. Here, we propose RepRec, a hybrid visual representation learning framework for self-supervised pre-training on large-scale unlabelled medical datasets, which takes advantage of both contrastive and generative modeling. To solve the aforementioned dilemma that MAE encounters, a convolutional encoder is pre-trained to provide low-level feature information, in a contrastive way; and a transformer encoder is pre-trained to produce high level semantic dependency, in a generative way – by recovering masked representations from the convolutional encoder. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134342808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate Probabilistic Monocular 3D Object Detection","authors":"Xuepeng Shi, Zhixiang Chen, Tae-Kyun Kim","doi":"10.1109/WACV56688.2023.00426","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00426","url":null,"abstract":"In autonomous driving, monocular 3D object detection is an important but challenging task. Towards accurate monocular 3D object detection, some recent methods recover the distance of objects from the physical height and visual height of objects. Such decomposition framework can introduce explicit constraints on the distance prediction, thus improving its accuracy and robustness. However, the inaccurate physical height and visual height prediction still may exacerbate the inaccuracy of the distance prediction. In this paper, we improve the framework by multivariate probabilistic modeling. We explicitly model the joint probability distribution of the physical height and visual height. This is achieved by learning a full covariance matrix of the physical height and visual height during training, with the guide of a multivariate likelihood. Such explicit joint probability distribution modeling not only leads to robust distance prediction when both the predicted physical height and visual height are inaccurate, but also brings learned covariance matrices with expected behaviors. The experimental results on the challenging Waymo Open and KITTI datasets show the effectiveness of our framework1.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132976384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Burst Vision Using Single-Photon Cameras","authors":"Sizhuo Ma, Paul Mos, E. Charbon, Mohit Gupta","doi":"10.1109/WACV56688.2023.00534","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00534","url":null,"abstract":"Single-photon avalanche diodes (SPADs) are novel image sensors that record the arrival of individual photons at extremely high temporal resolution. In the past, they were only available as single pixels or small-format arrays, for various active imaging applications such as LiDAR and microscopy. Recently, high-resolution SPAD arrays up to 3.2 megapixel have been realized, which for the first time may be able to capture sufficient spatial details for general computer vision tasks, purely as a passive sensor. However, existing vision algorithms are not directly applicable on the binary data captured by SPADs. In this paper, we propose developing quanta vision algorithms based on burst processing for extracting scene information from SPAD photon streams. With extensive real-world data, we demonstrate that current SPAD arrays, along with burst processing as an example plug-and-play algorithm, are capable of a wide range of downstream vision tasks in extremely challenging imaging conditions including fast motion, low light (< 5 lux) and high dynamic range. To our knowledge, this is the first attempt to demonstrate the capabilities of SPAD sensors for a wide gamut of real-world computer vision tasks including object detection, pose estimation, SLAM, and text recognition. We hope this work will inspire future research into developing computer vision algorithms in extreme scenarios using single-photon cameras.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128941829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Noor Fathima Ghouse, Jens Petersen, Guillaume Sautière, A. Wiggers, R. Pourreza
{"title":"A neural video codec with spatial rate-distortion control","authors":"Noor Fathima Ghouse, Jens Petersen, Guillaume Sautière, A. Wiggers, R. Pourreza","doi":"10.1109/WACV56688.2023.00533","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00533","url":null,"abstract":"Neural video compression algorithms are nearly competitive with hand-crafted codecs in terms of rate-distortion performance and subjective quality. However, many neural codecs are inflexible black boxes, and give users little to no control over the reconstruction quality and bitrate. In this work, we present a flexible neural video codec that combines ideas from variable-bitrate codecs and region-of-interest-based coding. By conditioning our model on a global rate-distortion tradeoff parameter and a region-of-interest (ROI) mask, we obtain dynamic control over the per-frame bitrate and the reconstruction quality in the ROI at test time. The resulting codec enables practical use cases such as coding under bitrate constraints with fixed ROI quality, while taking a negligible hit in performance compared to a fixed-rate model. We find that our codec performs best on sequences with complex motion, where we substantially outperform non-ROI codecs in the region of interest with Bjøntegaard-Delta rate savings exceeding 60%.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"312-315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130862065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Protocol for Evaluating Model Interpretation Methods from Visual Explanations","authors":"Hamed Behzadi Khormuji, José Oramas","doi":"10.1109/WACV56688.2023.00147","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00147","url":null,"abstract":"With the continuous development of Convolutional Neural Networks (CNNs), there is an increasing requirement towards the understanding of the representations they internally encode. The task of studying such encoded representations is referred to as model interpretation. Efforts along this direction, despite being proved efficient, stand with two weaknesses. First, there is low semanticity on the feedback they provide which leads toward subjective visualizations. Second, there is no unified protocol for the quantitative evaluation of interpretation methods which makes the comparison between current and future methods complex.To address these issues, we propose a unified evaluation protocol for the quantitative evaluation of interpretation methods. This is achieved by enhancing existing interpretation methods to be capable of generating visual explanations and then linking these explanations with a semantic label. To achieve this, we introduce the Weighted Average Intersection-over-Union (WAIoU) metric to estimate the coverage rate between explanation heatmaps and semantic annotations. This is complemented with an analysis of several binarization techniques for heatmaps, necessary when measuring coverage. Experiments considering several interpretation methods covering different CNN architectures pre-trained on multiple datasets show the effectiveness of the proposed protocol.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"47 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132782264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards A Framework for Privacy-Preserving Pedestrian Analysis","authors":"Anil Kunchala, Mélanie Bouroche, Bianca Schoen-Phelan","doi":"10.1109/WACV56688.2023.00435","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00435","url":null,"abstract":"The design of pedestrian-friendly infrastructures plays a crucial role in creating sustainable transportation in urban environments. Analyzing pedestrian behaviour in response to existing infrastructure is pivotal to planning, maintaining, and creating more pedestrian-friendly facilities. Many approaches have been proposed to extract such behaviour by applying deep learning models to video data. Video data, however, includes an broad spectrum of privacy-sensitive information about individuals, such as their location at a given time or who they are with. Most of the existing models use privacy-invasive methodologies to track, detect, and analyse individual or group pedestrian behaviour patterns. As a step towards privacy-preserving pedestrian analysis, this paper introduces a framework to anonymize all pedestrians before analyzing their behaviors. The proposed framework leverages recent developments in 3D wireframe reconstruction and digital in-painting to represent pedestrians with quantitative wireframes by removing their images while preserving pose, shape, and background scene context. To evaluate the proposed framework, a generic metric is introduced for each of privacy and utility. Experimental evaluation on widely-used datasets shows that the proposed framework outperforms traditional and state-of-the-art image filtering approaches by generating best privacy utility trade-off.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133694087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianle Chen, Mahsa Baktash, Zijian Wang, M. Salzmann
{"title":"Center-aware Adversarial Augmentation for Single Domain Generalization","authors":"Tianle Chen, Mahsa Baktash, Zijian Wang, M. Salzmann","doi":"10.1109/WACV56688.2023.00414","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00414","url":null,"abstract":"Domain generalization (DG) aims to learn a model from multiple training (i.e., source) domains that can generalize well to the unseen test (i.e., target) data coming from a different distribution. Single domain generalization (Single-DG) has recently emerged to tackle a more challenging, yet realistic setting, where only one source domain is available at training time. The existing Single-DG approaches typically are based on data augmentation strategies and aim to expand the span of source data by augmenting out-of-domain samples. Generally speaking, they aim to generate hard examples to confuse the classifier. While this may make the classifier robust to small perturbation, the generated samples are typically not diverse enough to mimic a large domain shift, resulting in sub-optimal generalization performance. To alleviate this, we propose a center-aware adversarial augmentation technique that expands the source distribution by altering the source samples so as to push them away from the class centers via a novel angular center loss. We conduct extensive experiments to demonstrate the effectiveness of our approach on several benchmark datasets for Single-DG and show that our method outperforms the state-of-the-art in most cases.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115188500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Supervised Learning for Low-light Image Restoration through Quality Assisted Pseudo-Labeling","authors":"Sameer Malik, R. Soundararajan","doi":"10.1109/WACV56688.2023.00409","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00409","url":null,"abstract":"Convolutional neural networks have been successful in restoring images captured under poor illumination conditions. Nevertheless, such approaches require a large number of paired low-light and ground truth images for training. Thus, we study the problem of semi-supervised learning for low-light image restoration when limited low-light images have ground truth labels. Our main contributions in this work are twofold. We first deploy an ensemble of low-light restoration networks to restore the unlabeled images and generate a set of potential pseudo-labels. We model the contrast distortions in the labeled set to generate different sets of training data and create the ensemble of networks. We then design a contrastive self-supervised learning based image quality measure to obtain the pseudo-label among the images restored by the ensemble. We show that training the restoration network with the pseudo-labels allows us to achieve excellent restoration performance even with very few labeled pairs. We conduct extensive experiments on three popular low-light image restoration datasets to show the superior performance of our semi-supervised low-light image restoration compared to other approaches. Project page is available at https://github.com/sameerIISc/SSL-LLR.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114462236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}