Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma
{"title":"Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations","authors":"Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma","doi":"10.1109/ICCV.2019.00320","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00320","url":null,"abstract":"Most previous fusion strategies either fail to fully utilize temporal information or cost too much time, and how to effectively fuse temporal information from consecutive frames plays an important role in video super-resolution (SR). In this study, we propose a novel progressive fusion network for video SR, which is designed to make better use of spatio-temporal information and is proved to be more efficient and effective than the existing direct fusion, slow fusion or 3D convolution strategies. Under this progressive fusion framework, we further introduce an improved non-local operation to avoid the complex motion estimation and motion compensation (ME&MC) procedures as in previous video SR approaches. Extensive experiments on public datasets demonstrate that our method surpasses state-of-the-art with 0.96 dB in average, and runs about 3 times faster, while requires only about half of the parameters.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"150 9 1","pages":"3106-3115"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83150376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Action Assessment by Joint Relation Graphs","authors":"Jiahui Pan, Jibin Gao, Weishi Zheng","doi":"10.1109/ICCV.2019.00643","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00643","url":null,"abstract":"We present a new model to assess the performance of actions from videos, through graph-based joint relation modelling. Previous works mainly focused on the whole scene including the performer's body and background, yet they ignored the detailed joint interactions. This is insufficient for fine-grained, accurate action assessment, because the action quality of each joint is dependent of its neighbouring joints. Therefore, we propose to learn the detailed joint motion based on the joint relations. We build trainable Joint Relation Graphs, and analyze joint motion on them. We propose two novel modules, the Joint Commonality Module and the Joint Difference Module, for joint motion learning. The Joint Commonality Module models the general motion for certain body parts, and the Joint Difference Module models the motion differences within body parts. We evaluate our method on six public Olympic actions for performance assessment. Our method outperforms previous approaches (+0.0912) and the whole-scene analysis (+0.0623) in the Spearman's Rank Correlation. We also demonstrate our model's ability to interpret the action assessment process.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"6330-6339"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82447517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoang Li, Ji Zhao, J. Bazin, Wen Chen, Zhe Liu, Yunhui Liu
{"title":"Quasi-Globally Optimal and Efficient Vanishing Point Estimation in Manhattan World","authors":"Haoang Li, Ji Zhao, J. Bazin, Wen Chen, Zhe Liu, Yunhui Liu","doi":"10.1109/ICCV.2019.00173","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00173","url":null,"abstract":"The image lines projected from parallel 3D lines intersect at a common point called the vanishing point (VP). Manhattan world holds for the scenes with three orthogonal VPs. In Manhattan world, given several lines in a calibrated image, we aim at clustering them by three unknown-but-sought VPs. The VP estimation can be reformulated as computing the rotation between the Manhattan frame and the camera frame. To compute this rotation, state-of-the-art methods are based on either data sampling or parameter search, and they fail to guarantee the accuracy and efficiency simultaneously. In contrast, we propose to hybridize these two strategies. We first compute two degrees of freedom (DOF) of the above rotation by two sampled image lines, and then search for the optimal third DOF based on the branch-and-bound. Our sampling accelerates our search by reducing the search space and simplifying the bound computation. Our search is not sensitive to noise and achieves quasi-global optimality in terms of maximizing the number of inliers. Experiments on synthetic and real-world images showed that our method outperforms state-of-the-art approaches in terms of accuracy and/or efficiency.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"1646-1654"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81163091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution","authors":"Xin Deng, Ren Yang, Mai Xu, P. Dragotti","doi":"10.1109/ICCV.2019.00317","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00317","url":null,"abstract":"In single image super-resolution (SISR), given a low-resolution (LR) image, one wishes to find a high-resolution (HR) version of it which is both accurate and photorealistic. Recently, it has been shown that there exists a fundamental tradeoff between low distortion and high perceptual quality, and the generative adversarial network (GAN) is demonstrated to approach the perception-distortion (PD) bound effectively. In this paper, we propose a novel method based on wavelet domain style transfer (WDST), which achieves a better PD tradeoff than the GAN based methods. Specifically, we propose to use 2D stationary wavelet transform (SWT) to decompose one image into low-frequency and high-frequency sub-bands. For the low-frequency sub-band, we improve its objective quality through an enhancement network. For the high-frequency sub-band, we propose to use WDST to effectively improve its perceptual quality. By feat of the perfect reconstruction property of wavelets, these sub-bands can be re-combined to obtain an image which has simultaneously high objective and perceptual quality. The numerical results on various datasets show that our method achieves the best trade-off between the distortion and perceptual quality among the existing state-of-the-art SISR methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"40 1","pages":"3076-3085"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81548857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-Baptiste Alayrac, João Carreira, Relja Arandjelovi'c, Andrew Zisserman
{"title":"Controllable Attention for Structured Layered Video Decomposition","authors":"Jean-Baptiste Alayrac, João Carreira, Relja Arandjelovi'c, Andrew Zisserman","doi":"10.1109/ICCV.2019.00583","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00583","url":null,"abstract":"The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structured neural network architecture that explicitly incorporates layers (as spatial masks) into its design. This improves separation performance over previous general purpose networks for this task; (ii) we demonstrate that we can augment the architecture to leverage external cues such as audio for controllability and to help disambiguation; and (iii) we experimentally demonstrate the effectiveness of our approach and training procedure with controlled experiments while also showing that the proposed model can be successfully applied to real-word applications such as reflection removal and action recognition in cluttered scenes.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"5733-5742"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81584661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixture-Kernel Graph Attention Network for Situation Recognition","authors":"M. Suhail, L. Sigal","doi":"10.1109/ICCV.2019.01046","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01046","url":null,"abstract":"Understanding images beyond salient actions involves reasoning about scene context, objects, and the roles they play in the captured event. Situation recognition has recently been introduced as the task of jointly reasoning about the verbs (actions) and a set of semantic-role and entity (noun) pairs in the form of action frames. Labeling an image with an action frame requires an assignment of values (nouns) to the roles based on the observed image content. Among the inherent challenges are the rich conditional structured dependencies between the output role assignments and the overall semantic sparsity. In this paper, we propose a novel mixture-kernel attention graph neural network (GNN) architecture designed to address these challenges. Our GNN enables dynamic graph structure during training and inference, through the use of a graph attention mechanism, and context-aware interactions between role pairs. We illustrate the efficacy of our model and design choices by conducting experiments on imSitu benchmark dataset, with accuracy improvements of up to 10% over the state-of-the-art.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"19 1","pages":"10362-10371"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82598618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gao Yan, B. Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, Dongrui Fan
{"title":"C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection","authors":"Gao Yan, B. Liu, Nan Guo, Xiaochun Ye, Fang Wan, Haihang You, Dongrui Fan","doi":"10.1109/ICCV.2019.00993","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00993","url":null,"abstract":"Weakly supervised object detection (WSOD) that only needs image-level annotations has obtained much attention recently. By combining convolutional neural network with multiple instance learning method, Multiple Instance Detection Network (MIDN) has become the most popular method to address the WSOD problem and been adopted as the initial model in many works. We argue that MIDN inclines to converge to the most discriminative object parts, which limits the performance of methods based on it. In this paper, we propose a novel Coupled Multiple Instance Detection Network (C-MIDN) to address this problem. Specifically, we use a pair of MIDNs, which work in a complementary manner with proposal removal. The localization information of the MIDNs is further coupled to obtain tighter bounding boxes and localize multiple objects. We also introduce a Segmentation Guided Proposal Removal (SGPR) algorithm to guarantee the MIL constraint after the removal and ensure the robustness of C-MIDN. Through a simple implementation of the C-MIDN with online detector refinement, we obtain 53.6% and 50.3% mAP on the challenging PASCAL VOC 2007 and 2012 benchmarks respectively, which significantly outperform the previous state-of-the-arts.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"9833-9842"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82717774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Bloesch, Tristan Laidlow, R. Clark, Stefan Leutenegger, A. Davison
{"title":"Learning Meshes for Dense Visual SLAM","authors":"Michael Bloesch, Tristan Laidlow, R. Clark, Stefan Leutenegger, A. Davison","doi":"10.1109/ICCV.2019.00595","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00595","url":null,"abstract":"Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"43 1","pages":"5854-5863"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82729966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Localization of Deep Inpainting Using High-Pass Fully Convolutional Network","authors":"Haodong Li, Jiwu Huang","doi":"10.1109/ICCV.2019.00839","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00839","url":null,"abstract":"Image inpainting has been substantially improved with deep learning in the past years. Deep inpainting can fill image regions with plausible contents, which are not visually apparent. Although inpainting is originally designed to repair images, it can even be used for malicious manipulations, e.g., removal of specific objects. Therefore, it is necessary to identify the presence of inpainting in an image. This paper presents a method to locate the regions manipulated by deep inpainting. The proposed method employs a fully convolutional network that is based on high-pass filtered image residuals. Firstly, we analyze and observe that the inpainted regions are more distinguishable from the untouched ones in the residual domain. Hence, a high-pass pre-filtering module is designed to get image residuals for enhancing inpainting traces. Then, a feature extraction module, which learns discriminative features from image residuals, is built with four concatenated ResNet blocks. The learned feature maps are finally enlarged by an up-sampling module, so that a pixel-wise inpainting localization map is obtained. The whole network is trained end-to-end with a loss addressing the class imbalance. Extensive experimental results evaluated on both synthetic and realistic images subjected to deep inpainting have shown the effectiveness of the proposed method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"2003 1","pages":"8300-8309"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82892753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shousheng Luo, X. Tai, Limei Huo, Yang Wang, R. Glowinski
{"title":"Convex Shape Prior for Multi-Object Segmentation Using a Single Level Set Function","authors":"Shousheng Luo, X. Tai, Limei Huo, Yang Wang, R. Glowinski","doi":"10.1109/ICCV.2019.00070","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00070","url":null,"abstract":"Many objects in real world have convex shapes. It is a difficult task to have representations for convex shapes with good and fast numerical solutions. This paper proposes a method to incorporate convex shape prior for multi-object segmentation using level set method. The relationship between the convexity of the segmented objects and the signed distance function corresponding to their union is analyzed theoretically. This result is combined with Gaussian mixture method for the multiple objects segmentation with convexity shape prior. Alternating direction method of multiplier (ADMM) is adopted to solve the proposed model. Special boundary conditions are also imposed to obtain efficient algorithms for 4th order partial differential equations in one step of ADMM algorithm. In addition, our method only needs one level set function regardless of the number of objects. So the increase in the number of objects does not result in the increase of model and algorithm complexity. Various numerical experiments are illustrated to show the performance and advantages of the proposed method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"613-621"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90250099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}