A. Athar, Sabarinath Mahadevan, Aljosa Osep, L. Leal-Taixé, B. Leibe
{"title":"A Single-Stage, Bottom-up Approach for Occluded VIS using Spatio-temporal Embeddings","authors":"A. Athar, Sabarinath Mahadevan, Aljosa Osep, L. Leal-Taixé, B. Leibe","doi":"10.1109/ICCVW54120.2021.00431","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00431","url":null,"abstract":"The task of Video Instance Segmentation (VIS) involves segmenting, tracking and classifying all object instances present in a given video clip. Occluded VIS is a more challenging extension of this task which involves longer video sequences where objects undergo significant occlusions over time. Most existing approaches to VIS involve multiple networks which separately handle segmenting, tracking and classifying object instances, and potentially a set of heuristics to combine the individual network outputs. By contrast, we employ just one, single-stage network without any heuristics or post-processing for the end-to-end task. Our approach is called ’STEm-Seg’, which is a bottom-up method for Segmenting object instances in videos using Spatio-Temporal Embeddings. We achieve 3rd place in the Occluded VIS challenge with an mAP score of 21.6% on the test set.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131870683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mickael Cormier, Fabian Röpke, T. Golda, J. Beyerer
{"title":"Interactive Labeling for Human Pose Estimation in Surveillance Videos","authors":"Mickael Cormier, Fabian Röpke, T. Golda, J. Beyerer","doi":"10.1109/ICCVW54120.2021.00190","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00190","url":null,"abstract":"Automatically detecting and estimating the movement of persons in real-world uncooperative scenarios is very challenging in great part due to limited and unreliably annotated data. For instance annotating a single human body pose for activity recognition requires 40-60 seconds in complex sequences, leading to long-winded and costly annotation processes. Therefore increasing the sizes of annotated datasets through crowdsourcing or automated annotation is often used at a great financial costs, without reliable validation processes and inadequate annotation tools greatly impacting the annotation quality. In this work we combine multiple techniques into a single web-based general-purpose annotation application. Pre-trained machine learning models enable annotators to interactively detect pedestrians, re-identify them throughout the sequence, estimate their poses, and correct annotation suggestions in the same interface. Annotations are then inter- and extrapolated between frames. The application is evaluated through several user studies and the results are extensively analyzed. Experiments demonstrate a 55% reduction in annotation time for less complex scenarios while simultaneously decreasing perceived annotator workload.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129079565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting Reference Imagery for Digital Drawing","authors":"Josh Holinaty, Alec Jacobson, Fanny Chevalier","doi":"10.1109/ICCVW54120.2021.00276","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00276","url":null,"abstract":"There is little understanding in the challenges artists face when using reference imagery while creating drawings digitally. How can this part of the creative process be better supported during the act of drawing? We conduct formative interviews with artists and reveal many adopt ad hoc strategies when integrating reference into their workflows. Interview results inform the design of a novel sketching interface in form of a technology probe to capture how artists use and access reference imagery, while also addressing opportunities to better support the use of reference, such as just-in-time presentation of imagery, automatic transparency to assist tracing, and features to mitigate design fixation. To capture how reference is used, we tasked artists to complete a series of digital drawings using our probe, with each task having particular reference needs. Artists were quick to adopt and appreciate the novel solutions provided by our probe, and we identified common strategies that can be exploited to support reference imagery in future creative tools.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"501 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134220492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite Aperture Stereo: 3D Reconstruction of Macro-Scale Scenes","authors":"M. Bailey, A. Hilton, Jean-Yves Guillemaut","doi":"10.1109/ICCVW54120.2021.00280","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00280","url":null,"abstract":"While the accuracy of multi-view stereo (MVS) has continued to advance, its performance reconstructing challenging scenes from images with a limited depth of field is generally poor. Typical implementations assume a pinhole camera model, and therefore treat defocused regions as a source of outlier. In this paper, we address these limitations by instead modelling the camera as a thick lens. Doing so allows us to exploit the complementary nature of stereo and defocus information, and overcome constraints imposed by traditional MVS methods. Using our novel reconstruction framework, we recover complete 3D models of complex macro-scale scenes. Our approach demonstrates robustness to view-dependent materials, and outperforms state-of-the-art MVS and depth from defocus across a range of real and synthetic datasets.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133425099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valentina Mus, I. Fursa, P. Newman, Fabio Cuzzolin, Andrew Bradley
{"title":"Multi-weather city: Adverse weather stacking for autonomous driving","authors":"Valentina Mus, I. Fursa, P. Newman, Fabio Cuzzolin, Andrew Bradley","doi":"10.1109/ICCVW54120.2021.00325","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00325","url":null,"abstract":"Autonomous vehicles make use of sensors to perceive the world around them, with heavy reliance on vision-based sensors such as RGB cameras. Unfortunately, since these sensors are affected by adverse weather, perception pipelines require extensive training on visual data under harsh conditions in order to improve the robustness of downstream tasks - data that is difficult and expensive to acquire. Based on GAN and CycleGAN architectures, we propose an overall (modular) architecture for constructing datasets, which allows one to add, swap out and combine components in order to generate images with diverse weather conditions. Starting from a single dataset with ground-truth, we generate 7 versions of the same data in diverse weather, and propose an extension to augment the generated conditions, thus resulting in a total of 14 adverse weather conditions, requiring a single ground truth. We test the quality of the generated conditions both in terms of perceptual quality and suitability for training downstream tasks, using real world, out-of-distribution adverse weather extracted from various datasets. We show improvements in both object detection and instance segmentation across all conditions, in many cases exceeding 10 percentage points increase in AP, and provide the materials and instructions needed to re-construct the multi-weather dataset, based upon the original Cityscapes dataset.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133923924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Zamora-Esquivel, Jesus Adan Cruz Vargas, A. Rhodes, L. Nachman, Narayan Sundararajan
{"title":"Convolutional Filter Approximation Using Fractional Calculus","authors":"J. Zamora-Esquivel, Jesus Adan Cruz Vargas, A. Rhodes, L. Nachman, Narayan Sundararajan","doi":"10.1109/ICCVW54120.2021.00047","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00047","url":null,"abstract":"We introduce a generalized fractional convolutional filter (FF) with the flexibility to behave as any novel, customized, or well-known filter (e.g. Gaussian, Sobel, and Laplacian). Our method can be trained using only five parameters – regardless of the kernel size. Furthermore, these kernels can be used in place of traditional kernels in any CNN topology. We demonstrate a nominal 5X parameter compression per kernel as compared to a traditional (5 × 5) convolutional kernel, and in the generalized case, a compression from N × N to 6 trainable parameters per kernel. We furthermore achieve 43X compression for 3D convolutional filters compared with conventional (7 × 7 × 7) 3D filters. Using fractional filters, we set a new MNIST record for the fewest number of parameters required to achieve above 99% classification accuracy with only 3, 750 trainable parameters. In addition to providing a generalizable method for CNN model compression, FFs present a compelling use case for the compression of CNNs that require large kernel sizes (e.g. medical imaging, semantic segmentation).","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133775724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Gillert, Bo Peters, U. V. Lukas, J. Kreyling
{"title":"Identification and Measurement of Individual Roots in Minirhizotron Images of Dense Root Systems","authors":"Alexander Gillert, Bo Peters, U. V. Lukas, J. Kreyling","doi":"10.1109/ICCVW54120.2021.00153","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00153","url":null,"abstract":"Semantic segmentation networks are prone to oversegmentation in areas where objects are tightly clustered. In minirhizotron images with densely packed plant root systems this can lead to a failure to separate individual roots, thereby skewing the root length and width measurements.We propose to deal with this problem by adding additional output heads to the segmentation model, one of which is used with a ridge detection algorithm as an intermediate step and a second one that directly estimates root width. With this method we are able to improve detection and width measurements in densely packed roots systems without negative effects on sparse root systems.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115973104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optical Braille Recognition Using Object Detection Neural Network","authors":"Ilya G. Ovodov","doi":"10.1109/ICCVW54120.2021.00200","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00200","url":null,"abstract":"Optical Braille recognition methods generally rely heavily on a Braille text’s geometric structure. They run into problems if this structure is distorted. Thus, they find it difficult to cope with images of book pages taken with a smartphone.We propose an optical Braille recognition method that uses an object detection convolutional neural network to detect whole Braille characters at once. The proposed algorithm is robust to deformations and perspective distortions of a Braille page displayed on an image. The algorithm is suitable for recognizing braille texts captured with a smartphone camera in domestic conditions. It can handle curved pages and images with perspective distortion. The proposed algorithm shows high performance and accuracy compared to existing methods.Additionally, we produced a new dataset containing 240 photos of Braille texts with annotation for each Braille letter. Both the proposed algorithm and the dataset are available at GitHub.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116904154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qimin Chen, Oscar Beijbom, Stephen Chan, J. Bouwmeester, D. Kriegman
{"title":"A New Deep Learning Engine for CoralNet","authors":"Qimin Chen, Oscar Beijbom, Stephen Chan, J. Bouwmeester, D. Kriegman","doi":"10.1109/ICCVW54120.2021.00412","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00412","url":null,"abstract":"CoralNet is a cloud-based website and platform for manual, semi-automatic and automatic analysis of coral reef images. Users access CoralNet through optimized web-based workflows for common tasks, and other systems can interface through API’s. Today, marine scientists are widely using CoralNet, and nearly 3,000 registered users have up-loaded 1,741,855 images from 2,040 distinct sources with over 65 million annotations. CoralNet is hosted on AWS, is free for users, and the code is open source 1. In January 2021, we released CoralNet 1.0 which has a new machine learning engine. This paper provides an overview of that engine, and the process of choosing the particular architecture, its training, and a comparison to some of the most promising architectures. In a nutshell, CoralNet 1.0 uses transfer learning with an EfficientNet-B0 backbone that is trained on 16M labelled patches from benthic images and a hierarchical Multi-layer Perceptron classifier that is trained on source-specific labelled data. When evaluated on a hold-out test set of 26 sources, the error rate of CoralNet 1.0 was 18.4% (relative) lower than CoralNet Beta.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116168248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Ioannou, Tasos Papagiannis, Thanos Tagaris, Georgios Alexandridis, A. Stafylopatis
{"title":"Visual interpretability analysis of Deep CNNs using an Adaptive Threshold method on Diabetic Retinopathy images","authors":"George Ioannou, Tasos Papagiannis, Thanos Tagaris, Georgios Alexandridis, A. Stafylopatis","doi":"10.1109/ICCVW54120.2021.00058","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00058","url":null,"abstract":"Deep neural networks have been dominating the field of computer vision, achieving exceptional performance on object detection and pattern recognition. However, despite the highly accurate predictions of these models, the continuous increase in depth and complexity comes at the cost of interpretability, making the task of explaining the reasoning behind these predictions very challenging. In this paper, an analysis of state-of-the-art approaches towards the direction of interpreting the networks’ representations, is carried out over two Diabetic Retinopathy image datasets, IDRiD and DDR. Furthermore, these techniques are compared in the task of image segmentation of the same datasets. This is to discover which method can produce the better attention maps that can solve the problem of segmentation without actually training the network for the specific task. To accomplish that we propose an adaptive threshold method that transforms the attention masks in a more suitable representation for segmentation. Experiments over multiple architectures were conducted to ensure the robustness of the results.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115391457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}