2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)最新文献_第4页

Single Patch Based 3D High-Fidelity Mask Face Anti-Spoofing 基于单补丁的3D高保真掩模人脸防欺骗

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00099

Samuel Huang, Wen-Huang Cheng, Robert Cheng

引用次数: 2

Manipulating Image Style Transformation via Latent-Space SVM 利用潜在空间支持向量机操纵图像样式变换

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00218

Qiudan Wang

{"title":"Manipulating Image Style Transformation via Latent-Space SVM","authors":"Qiudan Wang","doi":"10.1109/ICCVW54120.2021.00218","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00218","url":null,"abstract":"Deep Neural Networks have been proved as the go-to approach in modeling data distribution in a latent space, especially in Neural Style Transfer (NST), which casts a specific style extracted from a source image to another target image by calibrating the style and content information in a latent space. While existing methods focuses on different ways to extract features that more precisely describe style or content information to improve existing NST pipelines, the latent space of the NST model has not been well-explored. In this paper, we show that different half-spaces in the latent space are actually associated with particular styles of a network’s generated images. The corresponding constraints of these half-spaces can be computed by using linear classifiers, e.g. a Support Vector Machines (SVM). Leveraging the understanding of the relation between half-spaces in the latent space and output style, we propose the Linear Modification for Latent Representations (LMLR), a method that effectively increases or decreases the level of stylizing in the output image for any given NST model. We empirically evaluate our method on several state-of-the-art NST models and show that LMLR can manipulate the level of stylizing in the output image.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132013124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-modal Relational Reasoning Network for Visual Question Answering 面向视觉问答的跨模态关系推理网络

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00441

Hongyu Chen, Ruifang Liu, Bo Peng

{"title":"Cross-modal Relational Reasoning Network for Visual Question Answering","authors":"Hongyu Chen, Ruifang Liu, Bo Peng","doi":"10.1109/ICCVW54120.2021.00441","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00441","url":null,"abstract":"Visual Question Answering (VQA) is a challenging task that requires a cross-modal understanding of images and questions with relational reasoning leading to the correct answer. To bridge the semantic gap between these two modalities, previous works focus on the word-region alignments of all possible pairs without attending more attention to the corresponding word and object. Treating all pairs equally without consideration of relation consistency hinders the model’s performance. In this paper, to align the relation-consistent pairs and integrate the interpretability of VQA systems, we propose a Cross-modal Relational Reasoning Network (CRRN), to mask the inconsistent attention map and highlight the full latent alignments of corresponding word-region pairs. Specifically, we present two relational masks for inter-modal and intra-modal highlighting, inferring the more and less important words in sentences or regions in images. The attention interrelationship of consistent pairs can be enhanced with the shift of learning focus by masking the unaligned relations. Then, we propose two novel losses ℒCMAM and ℒSMAM with explicit supervision to capture the fine-grained interplay between vision and language. We have conduct thorough experiments to prove the effectiveness and achieve the competitive performance for reaching 61.74% on GQA benchmark.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129401819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

DriPE: A Dataset for Human Pose Estimation in Real-World Driving Settings DriPE:一个真实驾驶环境中人体姿态估计的数据集

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00321

Romain Guesdon, C. Crispim, L. Tougne

{"title":"DriPE: A Dataset for Human Pose Estimation in Real-World Driving Settings","authors":"Romain Guesdon, C. Crispim, L. Tougne","doi":"10.1109/ICCVW54120.2021.00321","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00321","url":null,"abstract":"The task of 2D human pose estimation has known a significant gain of performance with the advent of deep learning. This task aims to estimate the body keypoints of people in an image or a video. However, real-life applications of such methods bring new challenges that are under-represented in the general context datasets. For instance, driver status monitoring on consumer road vehicles introduces new difficulties, like self- and background body-part occlusions, varying illumination conditions, cramped view angles, etc. These monitoring conditions are currently absent in general purposes datasets. This paper proposes two main contributions. Firstly, we introduce DriPE (Driver Pose Estimation), a new dataset to foster the development and evaluation of methods for human pose estimation of drivers in consumer vehicles. This is the first publicly available dataset depicting drivers in real scenes. It contains 10k images of 19 different driver subjects, manually annotated with human body keypoints and an object bounding box. Secondly, we propose a new keypoint-based metric for human pose estimation. This metric highlights the limitations of current metrics for HPE evaluation and of current deep neural networks on pose estimation, both on general and driving-related datasets.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131037600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

SketchyDepth: from Scene Sketches to RGB-D Images SketchyDepth:从场景草图到RGB-D图像

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00274

G. Berardi, Samuele Salti, L. D. Stefano

{"title":"SketchyDepth: from Scene Sketches to RGB-D Images","authors":"G. Berardi, Samuele Salti, L. D. Stefano","doi":"10.1109/ICCVW54120.2021.00274","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00274","url":null,"abstract":"Sketch-based content generation is a creative and fun activity suited to casual and professional users that has many different applications. Today it is possible to generate the geometry and appearance of a single object by sketching it. Yet, only the appearance can be synthesized from a sketch of a whole scene. In this paper we propose the first method to generate both the depth map and image of a whole scene from a sketch. We demonstrate how generating geometrical information as a depth map is beneficial from a twofold perspective. On one hand, it improves the quality of the image synthesized from the sketch. On the other, it unlocks depth-enabled creative effects like Bokeh, fog, light variation, 3D photos and many others, which help enhancing the final output in a controlled way. We validate our method showing how generating depth maps directly from sketches produces better qualitative results with respect to alternative methods, i.e. running MiDaS after image generation. Finally we introduce depth sketching, a depth manipulation technique to further condition image generation without the need of additional annotation or training.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131134444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Where Did I See It? Object Instance Re-Identification with Attention 我在哪里看到的?具有注意的对象实例重新识别

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00038

Vaibhav Bansal, G. Foresti, N. Martinel

{"title":"Where Did I See It? Object Instance Re-Identification with Attention","authors":"Vaibhav Bansal, G. Foresti, N. Martinel","doi":"10.1109/ICCVW54120.2021.00038","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00038","url":null,"abstract":"Existing methods dealing with object instance re-identification (OIRe-ID) look for the best visual features match of a target object within a set of frames. Due to the nature of the problem, relying only on the visual appearance of object instances is likely to provide many false matches when there are multiple objects with similar appearance or multiple instances of same object class present in the scene. We focus on a rigid scene setup and to limit the negative effects of the aforementioned cases, we propose to exploit the background information. We believe that this would be particularly helpful in a rigid environment with a lot of reoccurring identical models of objects since it would provide rich context information. We introduce an attention-based mechanism to the existing Mask R-CNN architecture such that we learn to encode the important and distinct information in the background jointly with the foreground features relevant to rigid real-world scenarios. To evaluate the proposed approach, we run compelling experiments on the ScanNet dataset. Results demonstrate that we outperform significantly compared to different baselines and SOTA methods.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132995912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Real-Time Cell Counting in Unlabeled Microscopy Images 未标记显微镜图像中的实时细胞计数

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00083

Yuang Zhu, Zhao Chen, Yuxin Zheng, Qinghua Zhang, Xuan Wang

{"title":"Real-Time Cell Counting in Unlabeled Microscopy Images","authors":"Yuang Zhu, Zhao Chen, Yuxin Zheng, Qinghua Zhang, Xuan Wang","doi":"10.1109/ICCVW54120.2021.00083","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00083","url":null,"abstract":"Deep learning is largely applied to cell counting in microscopy images. However, most of the existing cell counting models are fully supervised and trained off-line. They adopt the usual training-testing framework, whereas the models are trained in advance to infer numbers of cells in test images. They require large amounts of manually labeled data for training but lack the ability to adapt to newly-collected unlabeled images that are fed to processing systems dynamically. To solve these problems, we propose a novel framework for real-time (RT) cell counting with density maps (DM). It is a semisupervised system which enables training with upcoming unlabeled images and predicting their cell counts simultaneously. It is also flexible enough to allow almost any cell counting model to be embedded within it. With a reliable and automatic training set renewing mechanism, it ensures counting accuracy while optimizing the models by both historical data and new images. To deal with cell variability and image complexity, we propose a Semisupervised Graph-Based Network (SGN) for within the RT counting framework. It leverages a count-sensitive measurement to construct dynamic graphs of DM patches. With the graph constraint, it regularizes an encoder-decoder to represent underlying data structures and gain robustness for cell counting. We have realized SGN along with several baseline networks and state-of-the-art methods within the RT counting framework. Experimental results validate the effectiveness and robustness of SGN. They also demonstrate the feasibility, efficacy and generalizability of the proposed framework for cell counting in unlabeled images.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131659711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepDraper: Fast and Accurate 3D Garment Draping over a 3D Human Body DeepDraper:在3D人体上快速准确的3D服装褶皱

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00163

Lokender Tiwari, B. Bhowmick

{"title":"DeepDraper: Fast and Accurate 3D Garment Draping over a 3D Human Body","authors":"Lokender Tiwari, B. Bhowmick","doi":"10.1109/ICCVW54120.2021.00163","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00163","url":null,"abstract":"Draping a 3D human mesh has garnered broad interest due to its wide applicability in virtual try-on, animations, etc. The 3D garment deformations produced by the existing methods are often inconsistent with the body shape, pose, and measurements. This paper proposes a single unified learning-based framework (DeepDraper) to predict garment deformation as a function of body shape, pose, measurements, and garment styles. We train the DeepDraper with coupled geometric and multi-view perceptual losses. Unlike existing methods, we additionally model garment deformations as a function of standard body measurements, which generally a buyer or a designer uses to buy or design perfect fit clothes. As a result, DeepDraper significantly outperforms the state-of-the-art deep network-based approaches in terms of fitness and realism and generalizes well to the unseen style of the garments. In addition to that, DeepDraper is ~ 10 times smaller in size and ~ 23 times faster than the closest state-of-the-art method (TailorNet), which favors its use in real-time applications with less computational power. Despite being trained on the static poses of the TailorNet [32] dataset, DeepDraper generalizes well to unseen body shapes, poses, and garment styles and produces temporally coherent garment deformations on the pose sequences even from the unseen AMASS [25] dataset.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131908904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SS-SFDA : Self-Supervised Source-Free Domain Adaptation for Road Segmentation in Hazardous Environments SS-SFDA:危险环境下道路分割的自监督无源域自适应

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00339

D. Kothandaraman, Rohan Chandra, Dinesh Manocha

引用次数: 21

PatchAugment: Local Neighborhood Augmentation in Point Cloud Classification PatchAugment:点云分类中的局部邻域增强

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2021-10-01 DOI: 10.1109/ICCVW54120.2021.00240

Shivanand Venkanna Sheshappanavar, Vinit Veerendraveer, C. Kambhamettu

{"title":"PatchAugment: Local Neighborhood Augmentation in Point Cloud Classification","authors":"Shivanand Venkanna Sheshappanavar, Vinit Veerendraveer, C. Kambhamettu","doi":"10.1109/ICCVW54120.2021.00240","DOIUrl":"https://doi.org/10.1109/ICCVW54120.2021.00240","url":null,"abstract":"Recent deep neural network models trained on smaller and less diverse datasets use data augmentation to alleviate limitations such as overfitting, reduced robustness, and lower generalization. Methods using 3D datasets are among the most common to use data augmentation techniques such as random point drop, scaling, translation, rotations, and jittering. However, these data augmentation techniques are fixed and are often applied to the entire object, ignoring the object’s local geometry. Different local neighborhoods on the object surface hold a different amount of geometric complexity. Applying the same data augmentation techniques at the object level is less effective in augmenting local neighborhoods with complex structures. This paper presents PatchAugment, a data augmentation framework to apply different augmentation techniques to the local neighborhoods. Our experimental studies on PointNet++ and DGCNN models demonstrate the effectiveness of PatchAugment on the task of 3D Point Cloud Classification. We evaluated our technique against these models using four benchmark datasets, ModelNet40 (synthetic), ModelNetlO (synthetic), SHREC’16 (synthetic) and ScanObjectNN (real-world).","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117333675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10