F. Rundo, A. Genovese, R. Leotta, F. Scotti, V. Piuri, S. Battiato
{"title":"Advanced 3D Deep Non-Local Embedded System for Self-Augmented X-Ray-based COVID-19 Assessment","authors":"F. Rundo, A. Genovese, R. Leotta, F. Scotti, V. Piuri, S. Battiato","doi":"10.1109/ICCVW54120.2021.00051","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00051","url":null,"abstract":"COVID-19 diagnosis using chest x-ray (CXR) imaging has a greater sensitivity and faster acquisition procedures than the Real-Time Polimerase Chain Reaction (RT-PCR) test, also requiring radiology machinery that is cheap and widely available. To process the CXR images, methods based on Deep Learning (DL) are being increasingly used, often in combination with data augmentation techniques. However, no method in the literature performs data augmentation in which the augmented training samples are processed collectively as a multi-channel image. Furthermore, no approach has yet considered a combination of attention-based networks with Convolutional Neural Networks (CNN) for COVID-19 detection. In this paper, we propose the first method for COVID-19 detection from CXR images that uses an innovative self-augmentation scheme based on reinforcement learning, which combines all the augmented images in a 3D deep volume and processes them together using a novel non-local deep CNN, which integrates convolutional and attention layers based on non-local blocks. Results on publicly-available databases exhibit a greater accuracy than the state of the art, also showing that the regions of CXR images influencing the decision are consistent with radiologists’ observations.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127366116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Laplacians in Chebyshev Graph Convolutional Networks","authors":"H. Sahbi","doi":"10.1109/ICCVW54120.2021.00234","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00234","url":null,"abstract":"Spectral graph convolutional networks (GCNs) are particular deep models which aim at extending neural networks to arbitrary irregular domains. The principle of these networks consists in projecting graph signals using the eigen-decomposition of their Laplacians, then achieving filtering in the spectral domain prior to back-project the resulting filtered signals onto the input graph domain. However, the success of these operations is highly dependent on the relevance of the used Laplacians which are mostly handcrafted and this makes GCNs clearly sub-optimal. In this paper, we introduce a novel spectral GCN that learns not only the usual convolutional parameters but also the Laplacian operators. The latter are designed \"end-to-end\" as a part of a recursive Chebyshev decomposition with the particularity of conveying both the differential and the non-differential properties of the learned representations – with increasing order and discrimination power – without overparametrizing the trained GCNs. Extensive experiments, conducted on the challenging task of skeleton-based action recognition, show the generalization ability and the outperformance of our proposed Laplacian design w.r.t. different baselines (built upon handcrafted and other learned Laplacians) as well as the related work.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126663962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Input Fusion for Practical Pedestrian Intention Prediction","authors":"Ankur Singh, U. Suddamalla","doi":"10.1109/ICCVW54120.2021.00260","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00260","url":null,"abstract":"Pedestrians are the most vulnerable road users and are at a high risk of fatal accidents. Accurate pedestrian detection and effectively analyzing their intentions to cross the road are critical for autonomous vehicles and ADAS solutions to safely navigate public roads. Faster and precise estimation of pedestrian intention helps in adopting safe driving behavior. Visual pose and motion are two important cues that have been previously employed to determine pedestrian intention. However, motion patterns can give erroneous results for short-term video sequences and are thus prone to mistakes. In this work, we propose an intention prediction network that utilizes pedestrian bounding boxes, pose, bounding box coordinates, and takes advantage of global context along with the local setting. This network implicitly learns pedestrians’ motion cues and location information to differentiate between a crossing and a non-crossing pedestrian. We experiment with different combinations of input features and propose multiple efficient models in terms of accuracy and inference speeds. Our best-performing model shows around 85% accuracy on the JAAD dataset.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127493258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Distribution Learning with Statistical Hypothesis Testing for COVID-19 CT Scan Classification","authors":"Guan-Lin Chen, Chih-Chung Hsu, Mei-Hsuan Wu","doi":"10.1109/ICCVW54120.2021.00057","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00057","url":null,"abstract":"With the massive damage in the world caused by Coronavirus Disease 2019 SARS-CoV-2 (COVID-19), many related research topics have been proposed in the past two years. The Chest Computed Tomography (CT) scan is the most valuable materials to diagnose the COVID-19 symptoms. However, most schemes for COVID-19 classification of Chest CT scan are based on single slice-level schemes, implying that the most critical CT slice should be selected from the original CT volume manually. In this paper, a statistical hypothesis test is adopted to the deep neural network to learn the implicit representation of CT slices. Specifically, we propose an Adaptive Distribution Learning with Statistical hypothesis Testing (ADLeaST) for COVID-19 CT scan classification can be used to judge the importance of each slice in CT scan and followed by adopting the non-parametric statistics method, Wilcoxon signed-rank test, to make predicted result explainable and stable. In this way, the impact of out-of-distribution (OOD) samples can be significantly reduced. Meanwhile, a self-attention mechanism without statistical analysis is also introduced into the back-bone network to learn the importance of the slices explicitly. The extensive experiments show that both the proposed schemes are stable and superior. Our experiments also demonstrated that the proposed ADLeaST significantly outperforms the state-of-the-art methods.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124901570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geoffroy Couasnet, Mouad Zine El Abidine, F. Laurens, H. Dutagaci, D. Rousseau
{"title":"Machine learning meets distinctness in variety testing","authors":"Geoffroy Couasnet, Mouad Zine El Abidine, F. Laurens, H. Dutagaci, D. Rousseau","doi":"10.1109/ICCVW54120.2021.00151","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00151","url":null,"abstract":"Distinctness is a binary trait used in variety testing to determine if a new plant variety can be considered distinct or not from a set of already existing varieties. Currently distinctness is mostly based on human visual perception. This communication considers distinctness with a machine learning perspective where distinctness is evaluated through an identification process based on information extraction from machine vision. Illustrations are provided on apple variety testing to perform distinctness based on color. An automated pipeline of image acquisition, processing and supervised learning is proposed. A feature space based on the 3D color histogram of a set of apples is built. This feature space is built using optimal transport, fractal dimension, mutual entropy and fractional anisotropy and it provides results in accordance with human expertise when applied to a set of varieties highly contrasted in color and another one with low color contrast. These results open new research directions for achieving higher-throughput, higher reproducibility and higher statistical confidence in variety testing.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125088996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Nashed, F. Poitevin, Harshit Gupta, G. Woollard, M. Kagan, C. Yoon, D. Ratner
{"title":"CryoPoseNet: End-to-End Simultaneous Learning of Single-particle Orientation and 3D Map Reconstruction from Cryo-electron Microscopy Data","authors":"Y. Nashed, F. Poitevin, Harshit Gupta, G. Woollard, M. Kagan, C. Yoon, D. Ratner","doi":"10.1109/ICCVW54120.2021.00452","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00452","url":null,"abstract":"Cryogenic electron microscopy (cryo-EM) provides images from different copies of the same biomolecule in arbitrary orientations. Here, we present an end-to-end unsupervised approach that learns individual particle orientations directly from cryo-EM data while reconstructing the 3D map of the biomolecule following random initialization. The approach relies on an auto-encoder architecture where the latent space is explicitly interpreted as orientations used by the decoder to form an image according to the physical projection model. We evaluate our method on simulated data and show that it is able to reconstruct 3D particle maps from noisy- and CTF-corrupted 2D projection images of unknown particle orientations.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121882263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Xie, Wankou Yang, Kaihua Zhang, Bo Liu, Guangting Wang, W. Zuo
{"title":"Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking","authors":"Fei Xie, Wankou Yang, Kaihua Zhang, Bo Liu, Guangting Wang, W. Zuo","doi":"10.1109/ICCVW54120.2021.00302","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00302","url":null,"abstract":"Segmentation-based tracking is currently a promising tracking paradigm due to the robustness towards non-grid deformations, comparing to the traditional box-based tracking methods. However, existing segmentation-based trackers are insufficient in modeling and exploiting dense pixel-wise correspondence across frames. To overcome these limitations, this paper presents a novel segmentation-based tracking architecture equipped with spatio-appearance memory networks. The appearance memory network utilizes spatio-temporal non-local similarity to propagate segmentation mask to the current frame, which can effectively capture long-range appearance variations and we further treat discriminative correlation filter as spatial memory bank to store the mapping between feature map and spatial map. Moreover, mutual promotion on dual memory networks greatly boost the overall tracking performance. We further propose a dynamic memory machine (DMM) which employs the Earth Mover’s Distance (EMD) to reweight memory samples. Without bells and whistles, our simple-yet-effective tracking architecture sets a new state-of-the-art on six tracking benchmarks. Besides, our approach achieves comparable results on two video object segmentation benchmarks. Code and model are released at https://github.com/phiphiphi31/DMB.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121541197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Andonian, Taesung Park, Bryan C. Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang
{"title":"Contrastive Feature Loss for Image Prediction","authors":"A. Andonian, Taesung Park, Bryan C. Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang","doi":"10.1109/ICCVW54120.2021.00220","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00220","url":null,"abstract":"Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result. Yet, this basic functionality remains an open problem. A popular line of approaches uses the L1 (mean absolute error) loss, either in the pixel or the feature space of pretrained deep networks. However, we observe that these losses tend to produce overly blurry and grey images, and other techniques such as GANs need to be employed to fight these artifacts. In this work, we introduce an information theory based approach to measuring similarity between two images. We argue that a good reconstruction should have high mutual information with the ground truth. This view enables learning a lightweight critic to \"calibrate\" a feature space in a contrastive manner, such that reconstructions of corresponding spatial patches are brought together, while other patches are repulsed. We show that our formulation immediately boosts the perceptual realism of output images when used as a drop-in replacement for the L1 loss, with or without an additional GAN loss.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131474419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GIAOTracker: A comprehensive framework for MCMOT with global information and optimizing strategies in VisDrone 2021","authors":"Yunhao Du, Jun-Jun Wan, Yanyun Zhao, Binyu Zhang, Zhihang Tong, Junhao Dong","doi":"10.1109/ICCVW54120.2021.00315","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00315","url":null,"abstract":"In recent years, algorithms for multiple object tracking tasks have benefited from great progresses in deep models and video quality. However, in challenging scenarios like drone videos, they still suffer from problems, such as small objects, camera movements and view changes. In this paper, we propose a new multiple object tracker, which employs Global Information And some Optimizing strategies, named GIAOTracker It consists of three stages, i.e., online tracking, global link and post-processing. Given detections in every frame, the first stage generates reliable track- lets using information of camera motion, object motion and object appearance. Then they are associated into trajectories by exploiting global clues and refined through four post-processing methods. With the effectiveness of the three stages, GIAOTracker achieves state-of-the-art performance on the VisDrone MOT dataset and wins the 2nd place in the VisDrone2021 MOT Challenge.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131645733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}