{"title":"Splatty- A Unified Image Demosaicing and Rectification Method","authors":"Pranav Verma, D. Meyer, Hanyang Xu, F. Kuester","doi":"10.1109/WACV48630.2021.00083","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00083","url":null,"abstract":"Image demosaicing and rectification are key tasks that are frequently used in many computer vision systems. To date, however, their implementations have been plagued with large memory requirements and inconvenient dataflow, making it difficult to scale them to real-time, high resolution settings. This has motivated the development of joint demo-saicing and rectification algorithms that resolve the back-ward mapping dataflow for improved hardware implementation. Towards this purpose, we propose Splatty: an algorithmic solution to pipelined image stream demosaicing and rectification for memory bound applications requiring computational efficiency.We begin by introducing a polynomial Look-up-Table (LUT) compression scheme that can encode any arbitrarily complex lens model for rectification while keeping the remapping errors below 1E-10 pixels, and reducing the memory footprint to O(min(m, n)) from O(mn) for an m × n sized image. The core contribution leverages this LUT for a unified, forward-only splatting algorithm for simultaneous demosaicing and rectification. We demonstrate that merging these two steps into a single, forward-only splatting pass with interpolation, provides distinctive dataflow and performance efficiency benefits while maintaining quality standards when compared to state-of-the-art demosaicing and rectification algorithms.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133879593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Fast Converging, Effective Conditional Generative Adversarial Networks with a Mirrored Auxiliary Classifier","authors":"Z. Wang","doi":"10.1109/WACV48630.2021.00261","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00261","url":null,"abstract":"Training conditional generative adversarial networks (GANs) has been remaining as a challenging task, though standard GANs have developed substantially and gained huge successes in recent years. In this paper, we propose a novel conditional GAN architecture with a mirrored auxiliary classifier (MAC-GAN) in its discriminator for the purpose of label conditioning. Unlike existing works, our mirrored auxiliary classifier contains both a real and a fake node for each specific class to distinguish real samples from generated samples that are assigned into the same category by previous models. Comparing with previous auxiliary classifier-based conditional GANs, our MAC-GAN learns a fast converging model for high-quality image generation, taking benefits from its robust, newly designed auxiliary classifier. Experiments on multiple benchmark datasets illustrate that our proposed model improves the quality of image synthesis compared with state-of-the-art approaches. Moreover, much better classification performance can be achieved with the mirrored auxiliary classifier, which can in turn promote the use of MAC-GAN in various transfer learning tasks.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129731812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust and Efficient Framework for Sports-Field Registration","authors":"Xiaohan Nie, Shixing Chen, Raffay Hamid","doi":"10.1109/WACV48630.2021.00198","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00198","url":null,"abstract":"We propose a novel framework to register sports-fields as they appear in broadcast sports videos. Unlike previous approaches, we particularly address the challenge of field- registration when: (a) there are not enough distinguishable features on the field, and (b) no prior knowledge is available about the camera. To this end, we detect a grid of key- points distributed uniformly on the entire field instead of using only sparse local corners and line intersections, thereby extending the keypoint coverage to the texture-less parts of the field as well. To further improve keypoint based homography estimate, we differentialbly warp and align it with a set of dense field-features defined as normalized distance- map of pixels to their nearest lines and key-regions. We predict the keypoints and dense field-features simultaneously using a multi-task deep network to achieve computational efficiency. To have a comprehensive evaluation, we have compiled a new dataset called SportsFields which is collected from 192 video-clips from 5 different sports covering large environmental and camera variations. We empirically demonstrate that our algorithm not only achieves state of the art field-registration accuracy but also runs in real-time for HD resolution videos using commodity hardware.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133643129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Liew, Scott D. Cohen, Brian L. Price, Long Mai, Jiashi Feng
{"title":"Deep Interactive Thin Object Selection","authors":"J. Liew, Scott D. Cohen, Brian L. Price, Long Mai, Jiashi Feng","doi":"10.1109/WACV48630.2021.00035","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00035","url":null,"abstract":"Existing deep learning based interactive segmentation methods have achieved remarkable performance with only a few user clicks, e.g. DEXTR [32] attaining 91.5% IoU on PASCAL VOC with only four extreme clicks. However, we observe even the state-of-the-art methods would often struggle in cases of objects to be segmented with elongated thin structures (e.g. bug legs and bicycle spokes). We investigate such failures, and find the critical reasons behind are two-fold: 1) lack of appropriate training dataset; and 2) extremely imbalanced distribution w.r.t. number of pixels belonging to thin and non-thin regions. Targeted at these challenges, we collect a large-scale dataset specifically for segmentation of thin elongated objects, named ThinObject-5K. Also, we present a novel integrative thin object segmentation network consisting of three streams. Among them, the high-resolution edge stream aims at preserving fine-grained details including elongated thin parts; the fixed-resolution context stream focuses on capturing semantic contexts. The two streams’ outputs are then amalgamated in the fusion stream to complement each other for help producing a refined segmentation output with sharper predictions around thin parts. Extensive experimental results well demonstrate the effectiveness of our proposed solution on segmenting thin objects, surpassing the baseline by ~ 30% IoUthin despite using only four clicks. Codes and dataset are available at https://github.com/liewjunhao/thin-object-selection.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132540529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siavash Khodadadeh, Saeid Motiian, Zhe L. Lin, Ladislau Bölöni, S. Ghadar
{"title":"Automatic Object Recoloring Using Adversarial Learning","authors":"Siavash Khodadadeh, Saeid Motiian, Zhe L. Lin, Ladislau Bölöni, S. Ghadar","doi":"10.1109/WACV48630.2021.00153","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00153","url":null,"abstract":"We propose a novel method for automatic object recoloring based on Generative Adversarial Networks (GANs). The user can simply give commands of the form recolor to which will be executed without any need of manual edit. Our approach takes advantage of pre-trained object detectors and saliency mask segmentation networks. The segmented mask of the given object along with the target color and the original image form the input to the GAN. The use of cycle consistency loss ensures the realistic look of the results. To our best knowledge, this is the first algorithm where the automatic recoloring is only limited by the ability of the mask extractor to map a natural language tag to a specific object in the image (several hundred object types at the time of this writing). For a performance comparison, we also adapted other state of the art methods to perform this task. We found that our method had consistently yielded qualitatively better recoloring results.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128648813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Future Moment Assessment for Action Query","authors":"Qiuhong Ke, Mario Fritz, B. Schiele","doi":"10.1109/WACV48630.2021.00326","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00326","url":null,"abstract":"In this paper, we aim to tackle the task of Assessing Future Moment of an Action of Interest (AFM-AI). The goal of this task is to assess if an action of interest will happen or not as well as the starting moment of the action. We aim to assess starting moments at any time-horizon of the future. To this end, we tackle the regression task of the starting moments as a generation task using a Deterministic Residual Guided Variational Regression Module (DR-VRM), which is built on a Variational Regression Module (VRM) and a deterministic residual network. The VRM takes the uncertainty into account and is capable of generating diverse predictions for the starting moment. The deterministic network encourages the VRM to learn from deterministic residual information in order to generate more precise predictions for moment assessment. Experimental results on three datasets clearly show that the proposed method is capable of generating both diverse and precise predictions of starting moments for query actions.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134011975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AutoRetouch: Automatic Professional Face Retouching","authors":"Alireza Shafaei","doi":"10.1109/WACV48630.2021.00103","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00103","url":null,"abstract":"Face retouching is one of the most time-consuming steps in professional photography pipelines. The existing auto-mated approaches blindly apply smoothing on the skin, destroying the delicate texture of the face. We present the first automatic face retouching approach that produces high-quality professional-grade results in less than two seconds. Unlike previous work, we show that our method preserves textures and distinctive features while retouching the skin. We demonstrate that our trained models generalize across datasets and are suitable for low-resolution cellphone images. Finally, we release the first large-scale, professionally retouched dataset with our baseline to encourage further work on the presented problem.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121739255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthias Domnik, Pedro F. Proença, J. Delaune, J. Thiem, R. Brockers
{"title":"Dense 3D-Reconstruction from Monocular Image Sequences for Computationally Constrained UAS∗","authors":"Matthias Domnik, Pedro F. Proença, J. Delaune, J. Thiem, R. Brockers","doi":"10.1109/WACV48630.2021.00186","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00186","url":null,"abstract":"The ability to find safe landing sites over complex 3D terrain is an essential safety feature for fully autonomous small unmanned aerial systems (UAS), which requires on-board perception for 3D reconstruction and terrain analysis if the overflown terrain is unknown. This is a challenge for UAS that are limited in size, weight and computational power, such as small rotorcrafts executing autonomous missions on Earth, or in planetary applications such as the Mars Helicopter. For such a computationally constraint system, we propose a structure from motion approach that uses inputs from a single downward facing camera to produce dense point clouds of the overflown terrain in real time. In contrast to existing approaches, our method uses metric pose information from a visual-inertial odometry algorithm as camera pose priors, which allows deploying a fast pose refinement step to align camera frames such that a conventional stereo algorithm can be used for dense 3D reconstruction. We validate the performance of our approach with extensive evaluations in simulation, and demonstrate the feasibility with data from UAS flights.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126078169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting Monocular Depth with Panoptic Segmentation Maps","authors":"Faraz Saeedan, S. Roth","doi":"10.1109/WACV48630.2021.00390","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00390","url":null,"abstract":"Monocular depth prediction is ill-posed by nature; hence successful approaches need to exploit the available cues to the fullest. Yet, real-world training data with depth ground-truth suffers from limited variability and data acquired from depth sensors is also sparse and prone to noise. While available datasets with semantic annotations might help to better exploit semantic cues, they are not immediately usable for depth prediction. We show how to leverage panoptic segmentation maps to boost monocular depth predictors in stereo training setups. In particular, we augment a self-supervised training scheme through panoptic-guided smoothing, panoptic-guided alignment, and panoptic left-right consistency from ground truth or inferred panoptic segmentation maps. Our approach incurs only a minor overhead, can easily be applied to a wide range of depth estimation methods that are trained at least partially using stereo pairs, providing a substantial boost in accuracy.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122024975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}