Eric Jardim, Lucas A Thomaz, Eduardo A B da Silva, Sergio L Netto
{"title":"Domain-Transformable Sparse Representation for Anomaly Detection in Moving-Camera Videos.","authors":"Eric Jardim, Lucas A Thomaz, Eduardo A B da Silva, Sergio L Netto","doi":"10.1109/TIP.2019.2940686","DOIUrl":"10.1109/TIP.2019.2940686","url":null,"abstract":"<p><p>This paper presents a special matrix factorization based on sparse representation that detects anomalies in video sequences generated with moving cameras. Such representation is made by associating the frames of the target video, that is a sequence to be tested for the presence of anomalies, with the frames of an anomaly-free reference video, which is a previously validated sequence. This factorization is done by a sparse coefficient matrix, and any target-video anomaly is encapsulated into a residue term. In order to cope with camera trepidations, domaintransformations are incorporated into the sparse representation process. Approximations of the transformed-domain optimization problem are introduced to turn it into a feasible iterative process. Results obtained from a comprehensive video database acquired with moving cameras on a visually cluttered environment indicate that the proposed algorithm provides a better geometric registration between reference and target videos, greatly improving the overall performance of the anomaly-detection system.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62587136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hazy Image Decolorization with Color Contrast Restoration.","authors":"Wei Wang, Zhengguo Li, Shiqian Wu, Liangcai Zeng","doi":"10.1109/TIP.2019.2939946","DOIUrl":"10.1109/TIP.2019.2939946","url":null,"abstract":"<p><p>It is challenging to convert a hazy color image into a gray-scale image because the color contrast field of a hazy image is distorted. In this paper, a novel decolorization algorithm is proposed to transfer a hazy image into a distortionrecovered gray-scale image. To recover the color contrast field, the relationship between the restored color contrast and its distorted input is presented in CIELab color space. Based on this restoration, a nonlinear optimization problem is formulated to construct the resultant gray-scale image. A new differentiable approximation solution is introduced to solve this problem with an extension of the Huber loss function. Experimental results show that the proposed algorithm effectively preserves the global luminance consistency while represents the original color contrast in gray-scales, which is very close to the corresponding ground truth gray-scale one.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeina Sinno, Anush Moorthy, Jan De Cock, Zhi Li, Alan C Bovik
{"title":"Quality Measurement of Images on Mobile Streaming Interfaces Deployed at Scale.","authors":"Zeina Sinno, Anush Moorthy, Jan De Cock, Zhi Li, Alan C Bovik","doi":"10.1109/TIP.2019.2939733","DOIUrl":"10.1109/TIP.2019.2939733","url":null,"abstract":"<p><p>With the growing use of smart cellular devices for entertainment purposes, audio and video streaming services now offer an increasingly wide variety of popular mobile applications that offer portable and accessible ways to consume content. The user interfaces of these applications have become increasingly visual in nature, and are commonly loaded with dense multimedia content such as thumbnail images, animated GIFs, and short videos. To efficiently render these and to aid rapid download to the client display, it is necessary to compress, scale and color subsample them. These operations introduce distortions, reducing the appeal of the application. It is desirable to be able to automatically monitor and govern the visual qualities of these small images, which are usually small images. However, while there exists a variety of high-performing image quality assessment (IQA) algorithms, none have been designed for this particular use case. This kind of content often has unique characteristics, such as overlaid graphics, intentional brightness, gradients, text, and warping. We describe a study we conducted on the subjective and objective quality of images embedded in the displayed user interfaces of mobile streaming applications. We created a database of typical \"billboard\" and \"thumbnail\" images viewed on such services. Using the collected data, we studied the effects of compression, scaling and chroma-subsampling on perceived quality by conducting a subjective study. We also evaluated the performance of leading picture quality prediction models on the new database. We report some surprising results regarding algorithm performance, and find that there remains ample scope for future model development.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color Control Functions for Multiprimary Displays I: Robustness Analysis and Optimization Formulations.","authors":"Carlos Eduardo Rodriguez-Pardo, Gaurav Sharma","doi":"10.1109/TIP.2019.2937067","DOIUrl":"10.1109/TIP.2019.2937067","url":null,"abstract":"<p><p>Color management for a multiprimary display requires, as a fundamental step, the determination of a color control function (CCF) that specifies control values for reproducing each color in the display's gamut. Multiprimary displays offer alternative choices of control values for reproducing a color in the interior of the gamut and accordingly alternative choices of CCFs. Under ideal conditions, alternative CCFs render colors identically. However, deviations in the spectral distributions of the primaries and the diversity of cone sensitivities among observers impact alternative CCFs differently, and, in particular, make some CCFs prone to artifacts in rendered images. We develop a framework for analyzing robustness of CCFs for multiprimary displays against primary and observer variations, incorporating a common model of human color perception. Using the framework, we propose analytical and numerical approaches for determining robust CCFs. First, via analytical development, we: (a) demonstrate that linearity of the CCF in tristimulus space endows it with resilience to variations, particularly, linearity can ensure invariance of the gray axis, (b) construct an axially linear CCF that is defined by the property of linearity over constant chromaticity loci, and (c) obtain an analytical form for the axially linear CCF that demonstrates it is continuous but suffers from the limitation that it does not have continuous derivatives. Second, to overcome the limitation of the axially linear CCF, we motivate and develop two variational objective functions for optimization of multiprimary CCFs, the first aims to preserve color transitions in the presence of primary/observer variations and the second combines this objective with desirable invariance along the gray axis, by incorporating the axially linear CCF. A companion Part II paper, presents an algorithmic approach for numerically computing optimal CCFs for the two alternative variational objective functions proposed here and presents results comparing alternative CCFs for several different 4,5, and 6 primary designs.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron
{"title":"Tunable VVC Frame Partitioning based on Lightweight Machine Learning.","authors":"Thomas Amestoy, Alexandre Mercat, Wassim Hamidouche, Daniel Menard, Cyril Bergeron","doi":"10.1109/TIP.2019.2938670","DOIUrl":"10.1109/TIP.2019.2938670","url":null,"abstract":"<p><p>Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30average for only 0.7% to 3.0% Bjxntegaard Delta Rate (BDBR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Related and Unrelated Tasks for Hierarchical Metric Learning and Image Classification.","authors":"Yu Zheng, Jianping Fan, Ji Zhang, Xinbo Gao","doi":"10.1109/TIP.2019.2938321","DOIUrl":"10.1109/TIP.2019.2938321","url":null,"abstract":"<p><p>In multi-task learning, multiple interrelated tasks are jointly learned to achieve better performance. In many cases, if we can identify which tasks are related, we can also clearly identify which tasks are unrelated. In the past, most researchers emphasized exploiting correlations among interrelated tasks while completely ignoring the unrelated tasks that may provide valuable prior knowledge for multi-task learning. In this paper, a new approach is developed to hierarchically learn a tree of multi-task metrics by leveraging prior knowledge about both the related tasks and unrelated tasks. First, a visual tree is constructed to hierarchically organize large numbers of image categories in a coarse-to-fine fashion. Over the visual tree, a multi-task metric classifier is learned for each node by exploiting both the related and unrelated tasks, where the learning tasks for training the classifiers for the sibling child nodes under the same parent node are treated as the interrelated tasks, and the others are treated as the unrelated tasks. In addition, the node-specific metric for the parent node is propagated to its sibling child nodes to control inter-level error propagation. Our experimental results demonstrate that our hierarchical metric learning algorithm achieves better results than other state-of-the-art algorithms.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Zhang, Ji Liu, Bob Zhanga, David Zhangb, Ce Zhu
{"title":"Deep Cascade Model based Face Recognition: When Deep-layered Learning Meets Small Data.","authors":"Lei Zhang, Ji Liu, Bob Zhanga, David Zhangb, Ce Zhu","doi":"10.1109/TIP.2019.2938307","DOIUrl":"10.1109/TIP.2019.2938307","url":null,"abstract":"<p><p>Sparse representation based classification (SRC), nuclear-norm matrix regression (NMR), and deep learning (DL) have achieved a great success in face recognition (FR). However, there still exist some intrinsic limitations among them. SRC and NMR based coding methods belong to one-step model, such that the latent discriminative information of the coding error vector cannot be fully exploited. DL, as a multi-step model, can learn powerful representation, but relies on large-scale data and computation resources for numerous parameters training with complicated back-propagation. Straightforward training of deep neural networks from scratch on small-scale data is almost infeasible. Therefore, in order to develop efficient algorithms that are specifically adapted for small-scale data, we propose to derive the deep models of SRC and NMR. Specifically, in this paper, we propose an end-to-end deep cascade model (DCM) based on SRC and NMR with hierarchical learning, nonlinear transformation and multi-layer structure for corrupted face recognition. The contributions include four aspects. First, an end-to-end deep cascade model for small-scale data without back-propagation is proposed. Second, a multi-level pyramid structure is integrated for local feature representation. Third, for introducing nonlinear transformation in layer-wise learning, softmax vector coding of the errors with class discrimination is proposed. Fourth, the existing representation methods can be easily integrated into our DCM framework. Experiments on a number of small-scale benchmark FR datasets demonstrate the superiority of the proposed model over state-of-the-art counterparts. Additionally, a perspective that deep-layered learning does not have to be convolutional neural network with back-propagation optimization is consolidated. The demo code is available in https://github.com/liuji93/DCM.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple Cycle-in-Cycle Generative Adversarial Networks for Unsupervised Image Super-Resolution.","authors":"Yongbing Zhang, Siyuan Liu, Chao Dong, Xinfeng Zhang, Yuan Yuan","doi":"10.1109/TIP.2019.2938347","DOIUrl":"10.1109/TIP.2019.2938347","url":null,"abstract":"<p><p>With the help of convolutional neural networks (CNN), the single image super-resolution problem has been widely studied. Most of these CNN based methods focus on learning a model to map a low-resolution (LR) image to a highresolution (HR) image, where the LR image is downsampled from the HR image with a known model. However, in a more general case when the process of the down-sampling is unknown and the LR input is degraded by noises and blurring, it is difficult to acquire the LR and HR image pairs for traditional supervised learning. Inspired by the recent unsupervised imagestyle translation applications using unpaired data, we propose a multiple Cycle-in-Cycle network structure to deal with the more general case using multiple generative adversarial networks (GAN) as the basis components. The first network cycle aims at mapping the noisy and blurry LR input to a noise-free LR space, then a new cycle with a well-trained ×2 network model is orderly introduced to super-resolve the intermediate output of the former cycle. The number of total cycles depends on the different up-sampling factors (×2, ×4, ×8). Finally, all modules are trained in an end-to-end manner to get the desired HR output. Quantitative indexes and qualitative results show that our proposed method achieves comparable performance with the state-of-the-art supervised models.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingwen Ye, Yongcheng Jing, Xinchao Wang, Kairi Ou, Dacheng Tao, Mingli Song
{"title":"Edge-Sensitive Human Cutout with Hierarchical Granularity and Loopy Matting Guidance.","authors":"Jingwen Ye, Yongcheng Jing, Xinchao Wang, Kairi Ou, Dacheng Tao, Mingli Song","doi":"10.1109/TIP.2019.2930146","DOIUrl":"10.1109/TIP.2019.2930146","url":null,"abstract":"<p><p>Human parsing and matting play important roles in various applications, such as dress collocation, clothing recommendation, and image editing. In this paper, we propose a lightweight hybrid model that unifies the fully-supervised hierarchical-granularity parsing task and the unsupervised matting one. Our model comprises two parts, the extensible hierarchical semantic segmentation block using CNN and the matting module composed of guided filters. Given a human image, the segmentation block stage-1 first obtains a primitive segmentation map to separate the human and the background. The primitive segmentation is then fed into stage-2 together with the original image to give a rough segmentation of human body. This procedure is repeated in the stage-3 to acquire a refined segmentation. The matting module takes as input the above estimated segmentation maps and produces the matting map, in a fully unsupervised manner. The obtained matting map is then in turn fed back to the CNN in the first block for refining the semantic segmentation results.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-View Video Synopsis via Simultaneous Object-Shifting and View-Switching Optimization.","authors":"Zhensong Zhang, Yongwei Nie, Hanqiu Sun, Qing Zhang, Qiuxia Lai, Guiqing Li, Mingyu Xiao","doi":"10.1109/TIP.2019.2938086","DOIUrl":"10.1109/TIP.2019.2938086","url":null,"abstract":"<p><p>We present a method for synopsizing multiple videos captured by a set of surveillance cameras with some overlapped field-of-views. Currently, object-based approaches that directly shift objects along the time axis are already able to compute compact synopsis results for multiple surveillance videos. The challenge is how to present the multiple synopsis results in a more compact and understandable way. Previous approaches show them side by side on the screen, which however is difficult for user to comprehend. In this paper, we solve the problem by joint object-shifting and camera view-switching. Firstly, we synchronize the input videos, and group the same object in different videos together. Then we shift the groups of objects along the time axis to obtain multiple synopsis videos. Instead of showing them simultaneously, we just show one of them at each time, and allow to switch among the views of different synopsis videos. In this view switching way, we obtain just a single synopsis results consisting of content from all the input videos, which is much easier for user to follow and understand. To obtain the best synopsis result, we construct a simultaneous object-shifting and view-switching optimization framework instead of solving them separately. We also present an alternative optimization strategy composed of graph cuts and dynamic programming to solve the unified optimization. Experiments demonstrate that our single synopsis video generated from multiple input videos is compact, complete, and easy to understand.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62586355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}