2021 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献_第2页

3DPoseLite: A Compact 3D Pose Estimation Using Node Embeddings 3DPoseLite:使用节点嵌入的紧凑3D姿态估计

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00192

Meghal Dani, Karan Narain, R. Hebbalaguppe

引用次数: 8

Towards Enhancing Fine-grained Details for Image Matting 增强图像抠图的细粒度细节

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00043

Chang Liu, Henghui Ding, Xudong Jiang

{"title":"Towards Enhancing Fine-grained Details for Image Matting","authors":"Chang Liu, Henghui Ding, Xudong Jiang","doi":"10.1109/WACV48630.2021.00043","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00043","url":null,"abstract":"In recent years, deep natural image matting has been rapidly evolved by extracting high-level contextual features into the model. However, most current methods still have difficulties with handling tiny details, like hairs or furs. In this paper, we argue that recovering these microscopic de-tails relies on low-level but high-definition texture features. However, these features are downsampled in a very early stage in current encoder-decoder-based models, resulting in the loss of microscopic details. To address this issue, we design a deep image matting model to enhance fine-grained details. Our model consists of two parallel paths: a conventional encoder-decoder Semantic Path and an independent downsampling-free Textural Compensate Path (TCP). The TCP is proposed to extract fine-grained details such as lines and edges in the original image size, which greatly enhances the fineness of prediction. Meanwhile, to lever-age the benefits of high-level context, we propose a feature fusion unit(FFU) to fuse multi-scale features from the se-mantic path and inject them into the TCP. In addition, we have observed that poorly annotated trimaps severely affect the performance of the model. Thus we further propose a novel term in loss function and a trimap generation method to improve our model’s robustness to the trimaps. The experiments show that our method outperforms previous start-of-the-art methods on the Composition-1k dataset.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126246141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Few-Shot Learning via Feature Hallucination with Variational Inference 基于变分推理的特征幻觉的短时学习

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00401

Qinxuan Luo, Lingfeng Wang, J. Lv, Shiming Xiang, Chunhong Pan

{"title":"Few-Shot Learning via Feature Hallucination with Variational Inference","authors":"Qinxuan Luo, Lingfeng Wang, J. Lv, Shiming Xiang, Chunhong Pan","doi":"10.1109/WACV48630.2021.00401","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00401","url":null,"abstract":"Deep learning has achieved huge success in the field of artificial intelligence, but the performance heavily depends on labeled data. Few-shot learning aims to make a model rapidly adapt to unseen classes with few labeled samples after training on a base dataset, and this is useful for tasks lacking labeled data such as medical image processing. Considering that the core problem of few-shot learning is the lack of samples, a straightforward solution to this issue is data augmentation. This paper proposes a generative model (VI-Net) based on a cosine-classifier baseline. Specifically, we construct a framework to learn to define a generating space for each category in the latent space based on few support samples. In this way, new feature vectors can be generated to help make the decision boundary of classifier sharper during the fine-tuning process. To evaluate the effectiveness of our proposed approach, we perform comparative experiments and ablation studies on mini-ImageNet and CUB. Experimental results show that VI-Net does improve performance compared with the baseline and obtains the state-of-the-art result among other augmentation-based methods.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121736377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

EAGLE-Eye: Extreme-pose Action Grader using detaiL bird’s-Eye view EAGLE-Eye:使用细节鸟瞰视图的极端姿势动作分级器

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00044

Mahdiar Nekoui, Fidel Omar Tito Cruz, Li Cheng

{"title":"EAGLE-Eye: Extreme-pose Action Grader using detaiL bird’s-Eye view","authors":"Mahdiar Nekoui, Fidel Omar Tito Cruz, Li Cheng","doi":"10.1109/WACV48630.2021.00044","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00044","url":null,"abstract":"Measuring the quality of a sports action entails attending to the execution of the short-term components as well as overall impression of the whole program. In this assessment, both appearance clues and pose dynamics features should be involved. Current approaches often treat a sports routine as a simple fine-grained action, while taking little heed of its complex temporal structure. Besides, they rely solely on either appearance or pose features to score the performance. In this paper, we present JCA and ADA blocks that are responsible for reasoning about the coordination among the joints and appearance dynamics throughout the performance. We build our two-stream network upon the separate stack of these blocks. The early blocks capture the fine-grained temporal dependencies while the last ones reason about the long-term coarse-grained relations. We further introduce an annotated dataset of sports images with unusual pose configurations to boost the performance of pose estimation in such scenarios. Our experiments show that the proposed method not only outperforms the previous works in short-term action assessment but also is the first to generalize well to minute-long figure-skating scoring.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132298879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

S-VVAD: Visual Voice Activity Detection by Motion Segmentation S-VVAD:视觉语音活动检测的运动分割

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00238

Muhammad Shahid, C. Beyan, Vittorio Murino

{"title":"S-VVAD: Visual Voice Activity Detection by Motion Segmentation","authors":"Muhammad Shahid, C. Beyan, Vittorio Murino","doi":"10.1109/WACV48630.2021.00238","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00238","url":null,"abstract":"We address the challenging Voice Activity Detection (VAD) problem, which determines \"Who is Speaking and When?\" in audiovisual recordings. The typical audio-based VAD systems can be ineffective in the presence of ambient noise or noise variations. Moreover, due to technical or privacy reasons, audio might not be always available. In such cases, the use of video modality to perform VAD is desirable. Almost all existing visual VAD methods rely on body part detection, e.g., face, lips, or hands. In contrast, we propose a novel visual VAD method operating directly on the entire video frame, without the explicit need of detecting a person or his/her body parts. Our method, named S-VVAD, learns body motion cues associated with speech activity within a weakly supervised segmentation framework. Therefore, it not only detects the speakers/not-speakers but simultaneously localizes the image positions of them. It is an end-to-end pipeline, person-independent and it does not require any prior knowledge nor pre-processing. S-VVAD performs well in various challenging conditions and demonstrates the state-of-the-art results on multiple datasets. Moreover, the better generalization capability of S-VVAD is confirmed for cross-dataset and person-independent scenarios.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132411795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Multi-Task Knowledge Distillation for Eye Disease Prediction 多任务知识精馏用于眼病预测

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00403

Sahil Chelaramani, Manish Gupta, Vipul Agarwal, Prashant Gupta, Ranya Habash

{"title":"Multi-Task Knowledge Distillation for Eye Disease Prediction","authors":"Sahil Chelaramani, Manish Gupta, Vipul Agarwal, Prashant Gupta, Ranya Habash","doi":"10.1109/WACV48630.2021.00403","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00403","url":null,"abstract":"While accurate disease prediction from retinal fundus images is critical, collecting large amounts of high quality labeled training data to build such supervised models is difficult. Deep learning classifiers have led to high accuracy results across a wide variety of medical imaging problems, but they need large amounts of labeled data. Given a fundus image, we aim to evaluate various solutions for learning deep neural classifiers using small labeled data for three tasks related to eye disease prediction: (T1) predicting one of the five broad categories – diabetic retinopathy, age-related macular degeneration, glaucoma, melanoma and normal, (T2) predicting one of the 320 fine-grained disease sub-categories, (T3) generating a textual diagnosis. The problem is challenging because of small data size, need for predictions across multiple tasks, handling image variations, and large number of hyper-parameter choices. Modeling the problem under a multi-task learning (MTL) setup, we investigate the contributions of each of the proposed tasks while dealing with a small amount of labeled data. Further, we suggest a novel MTL-based teacher ensemble method for knowledge distillation. On a dataset of 7212 labeled and 35854 unlabeled images across 3502 patients, our technique obtains ~83% accuracy, ~75% top-5 accuracy and ~48 BLEU for tasks T1, T2 and T3 respectively. Even with 15% training data, our method outperforms baselines by 8.1, 3.2 and 11.2 points for the three tasks respectively.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"04 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130007701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Size-invariant Detection of Marine Vessels From Visual Time Series 基于视觉时间序列的船舶尺寸不变性检测

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00049

T. Marques, A. Albu, P. O'Hara, Norma Serra, Ben Morrow, L. McWhinnie, R. Canessa

{"title":"Size-invariant Detection of Marine Vessels From Visual Time Series","authors":"T. Marques, A. Albu, P. O'Hara, Norma Serra, Ben Morrow, L. McWhinnie, R. Canessa","doi":"10.1109/WACV48630.2021.00049","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00049","url":null,"abstract":"Marine vessel traffic is one of the main sources of negative anthropogenic impact upon marine environments. The automatic identification of boats in monitoring images facilitates conservation, research and patrolling efforts. However, the diverse sizes of vessels, the highly dynamic water surface and weather-related visibility issues significantly hinder this task. While recent deep learning (DL)-based object detectors identify well medium- and large-sized boats, smaller vessels, often responsible for substantial disturbance to sensitive marine life, are typically not detected. We propose a detection approach that combines state-of-the-art object detectors and a novel Detector of Small Marine Vessels (DSMV) to identify boats of any size. The DSMV uses a short time series of images and a novel bi-directional Gaussian Mixture technique to determine motion in combination with context-based filtering and a DL-based image classifier. Experimental results obtained on our novel datasets of images containing boats of various sizes show that the proposed approach comfortably outperforms five popular state-of-the-art object detectors. Code and datasets available at https://github.com/tunai/hybrid-boat-detection.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130285812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Multimodal Trajectory Predictions for Autonomous Driving without a Detailed Prior Map 无详细先验地图的自动驾驶多模式轨迹预测

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00377

A. Kawasaki, A. Seki

{"title":"Multimodal Trajectory Predictions for Autonomous Driving without a Detailed Prior Map","authors":"A. Kawasaki, A. Seki","doi":"10.1109/WACV48630.2021.00377","DOIUrl":"https://doi.org/10.1109/WACV48630.2021.00377","url":null,"abstract":"Predicting the future trajectories of surrounding vehicles is a key competence for safe and efficient real-world autonomous driving systems. Previous works have presented deep neural network models for predictions using a detailed prior map which includes driving lanes and explicitly expresses the road rules like legal traffic directions and valid paths through intersections. Since it is unrealistic to assume the existence of the detailed prior maps for all areas, we use a map generated from only perceptual data (3D points measured by a LiDAR sensor). Such maps do not explicitly denote road rules, which makes prediction tasks more difficult. To overcome this problem, we propose a novel generative adversarial network (GAN) based framework. A discriminator in our framework can distinguish whether predicted trajectories follow road rules, and a generator can predict trajectories following it. Our framework implicitly extracts road rules by projecting trajectories onto the map via a differentiable function and training positional relations between trajectories and obstacles on the map. We also extend our framework to multimodal predictions so that various future trajectories are predicted. Experimental results show that our method outperforms other state-of-the-art methods in terms of trajectory errors and the ratio of trajectories that fall on drivable lanes.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"13 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131056130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Adaptiope: A Modern Benchmark for Unsupervised Domain Adaptation Adaptiope:无监督域自适应的现代基准

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00015

Tobias Ringwald, R. Stiefelhagen

引用次数: 21

Misclassification Risk and Uncertainty Quantification in Deep Classifiers 深度分类器的误分类风险与不确定性量化

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI: 10.1109/WACV48630.2021.00253

Murat Sensoy, Maryam Saleki, S. Julier, Reyhan Aydoğan, John Reid

引用次数: 11