Ion Martinikorena, Andoni Larumbe-Bergera, Mikel Ariz, Sonia Porta, Rafael Cabeza, Arantxa Villanueva
{"title":"Low cost gaze estimation: knowledge-based solutions.","authors":"Ion Martinikorena, Andoni Larumbe-Bergera, Mikel Ariz, Sonia Porta, Rafael Cabeza, Arantxa Villanueva","doi":"10.1109/TIP.2019.2946452","DOIUrl":"10.1109/TIP.2019.2946452","url":null,"abstract":"<p><p>Eye tracking technology in low resolution scenarios is not a completely solved issue to date. The possibility of using eye tracking in a mobile gadget is a challenging objective that would permit to spread this technology to non-explored fields. In this paper, a knowledge based approach is presented to solve gaze estimation in low resolution settings. The understanding of the high resolution paradigm permits to propose alternative models to solve gaze estimation. In this manner, three models are presented: a geometrical model, an interpolation model and a compound model, as solutions for gaze estimation for remote low resolution systems. Since this work considers head position essential to improve gaze accuracy, a method for head pose estimation is also proposed. The methods are validated in an optimal framework, I2Head database, which combines head and gaze data. The experimental validation of the models demonstrates their sensitivity to image processing inaccuracies, critical in the case of the geometrical model. Static and extreme movement scenarios are analyzed showing the higher robustness of compound and geometrical models in the presence of user's displacement. Accuracy values of about 3° have been obtained, increasing to values close to 5° in extreme displacement settings, results fully comparable with the state-of-the-art.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Motion-Assisted Tensor Completion Method for Background Initialization in Complex Video Sequences.","authors":"Ibrahim Kajo, Nidal Kamel, Yassine Ruichek","doi":"10.1109/TIP.2019.2946098","DOIUrl":"10.1109/TIP.2019.2946098","url":null,"abstract":"<p><p>The background Initialization (BI) problem has attracted the attention of researchers in different image/video processing fields. Recently, a tensor-based technique called spatiotemporal slice-based singular value decomposition (SS-SVD) has been proposed for background initialization. SS-SVD applies the SVD on the tensor slices and estimates the background from low-rank information. Despite its efficiency in background initialization, the performance of SS-SVD requires further improvement in the case of complex sequences with challenges such as stationary foreground objects (SFOs), illumination changes, low frame-rate, and clutter. In this paper, a self-motion-assisted tensor completion method is proposed to overcome the limitations of SS-SVD in complex video sequences and enhance the visual appearance of the initialized background. With the proposed method, the motion information, extracted from the sparse portion of the tensor slices, is incorporated with the low-rank information of SS-SVD to eliminate existing artifacts in the initiated background. Efficient blending schemes between the low-rank (background) and sparse (foreground) information of the tensor slices is developed for scenarios such as SFO removal, lighting variation processing, low frame-rate processing, crowdedness estimation, and best frame selection. The performance of the proposed method on video sequences with complex scenarios is compared with the top-ranked state-of-the-art techniques in the field of background initialization. The results not only validate the improved performance over the majority of the tested challenges but also demonstrate the capability of the proposed method to initialize the background in less computational time.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm.","authors":"Jian Lou, Yiu-Ming Cheung","doi":"10.1109/TIP.2019.2946445","DOIUrl":"10.1109/TIP.2019.2946445","url":null,"abstract":"<p><p>Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon J G Lewis, David Dagan Feng
{"title":"Graph Sequence Recurrent Neural Network for Vision-based Freezing of Gait Detection.","authors":"Kun Hu, Zhiyong Wang, Wei Wang, Kaylena A Ehgoetz Martens, Liang Wang, Tieniu Tan, Simon J G Lewis, David Dagan Feng","doi":"10.1109/TIP.2019.2946469","DOIUrl":"10.1109/TIP.2019.2946469","url":null,"abstract":"<p><p>Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease (PD), a neurodegenerative disorder which impacts millions of people around the world. Accurate assessment of FoG is critical for the management of PD and to evaluate the efficacy of treatments. Currently, the assessment of FoG requires well-trained experts to perform time-consuming annotations via vision-based observations. Thus, automatic FoG detection algorithms are needed. In this study, we formulate vision-based FoG detection, as a fine-grained graph sequence modelling task, by representing the anatomic joints in each temporal segment with a directed graph, since FoG events can be observed through the motion patterns of joints. A novel deep learning method is proposed, namely graph sequence recurrent neural network (GS-RNN), to characterize the FoG patterns by devising graph recurrent cells, which take graph sequences of dynamic structures as inputs. For the cases of which prior edge annotations are not available, a data-driven based adjacency estimation method is further proposed. To the best of our knowledge, this is one of the first studies on vision-based FoG detection using deep neural networks designed for graph sequences of dynamic structures. Experimental results on more than 150 videos collected from 45 patients demonstrated promising performance of the proposed GS-RNN for FoG detection with an AUC value of 0.90.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangcheng Wang, Zhongyuan Wang, Ke Gu, Leida Li, Zhifang Xia, Lifang Wu
{"title":"Blind Quality Metric of DIBR-Synthesized Images in the Discrete Wavelet Transform Domain.","authors":"Guangcheng Wang, Zhongyuan Wang, Ke Gu, Leida Li, Zhifang Xia, Lifang Wu","doi":"10.1109/TIP.2019.2945675","DOIUrl":"10.1109/TIP.2019.2945675","url":null,"abstract":"<p><p>Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the \"blind\" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallax Tolerant Light Field Stitching for Hand-held Plenoptic Cameras.","authors":"Xin Jin, Pei Wang, Qionghai Dai","doi":"10.1109/TIP.2019.2945687","DOIUrl":"10.1109/TIP.2019.2945687","url":null,"abstract":"<p><p>Light field (LF) stitching is a potential solution to improve the field of view (FOV) for hand-held plenoptic cameras. Existing LF stitching methods cannot provide accurate registration for scenes with large depth variation. In this paper, a novel LF stitching method is proposed to handle parallax in the LFs more flexibly and accurately. First, a depth layer map (DLM) is proposed to guarantee adequate feature points on each depth layer. For the regions of nondeterministic depth, superpixel layer map (SLM) is proposed based on LF spatial correlation analysis to refine the depth layer assignments. Then, DLM-SLM-based LF registration is proposed to derive the location dependent homography transforms accurately and to warp LFs to its corresponding position without parallax interference. 4D graph-cut is further applied to fuse the registration results for higher LF spatial continuity and angular continuity. Horizontal, vertical and multi-LF stitching are tested for different scenes, which demonstrates the superior performance provided by the proposed method in terms of subjective quality of the stitched LFs, epipolar plane image consistency in the stitched LF, and perspective-averaged correlation between the stitched LF and the input LFs.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Artusi, Francesco Banterle, Fabio Carrara, Alejandro Moreo
{"title":"Efficient Evaluation of Image Quality via Deep-Learning Approximation of Perceptual Metrics.","authors":"Alessandro Artusi, Francesco Banterle, Fabio Carrara, Alejandro Moreo","doi":"10.1109/TIP.2019.2944079","DOIUrl":"10.1109/TIP.2019.2944079","url":null,"abstract":"<p><p>Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking the HVS is known to be complex and computationally expensive (both in terms of time and memory), and its usage is thus limited to a few applications and to small input data. All of this makes such metrics not fully attractive in real-world scenarios. To address these issues, we propose Deep Image Quality Metric (DIQM), a deep-learning approach to learn the global image quality feature (mean-opinion-score). DIQM can emulate existing visual metrics efficiently, reducing the computational costs by more than an order of magnitude with respect to existing implementations.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Block-sparsity for Hyperspectral Kronecker Compressive Sensing: a Tensor-based Bayesian Method.","authors":"Rongqiang Zhao, Qiang Wang, Jun Fu, Luquan Ren","doi":"10.1109/TIP.2019.2944722","DOIUrl":"10.1109/TIP.2019.2944722","url":null,"abstract":"<p><p>Bayesian methods are attracting increasing attention in the field of compressive sensing (CS), as they are applicable to recover signals from random measurements. However, these methods have limited use in many tensor-based cases such as hyperspectral Kronecker compressive sensing (HKCS), because they exploit the sparsity in only one dimension. In this paper, we propose a novel Bayesian model for HKCS in an attempt to overcome the above limitation. The model exploits multi-dimensional block-sparsity such that the information redundancies in all dimensions are eliminated. Laplace prior distributions are employed for sparse coefficients in each dimension, and their coupling is consistent with the multi-dimensional block-sparsity model. Based on the proposed model, we develop a tensor-based Bayesian reconstruction algorithm, which decouples the hyperparameters for each dimension via a low-complexity technique. Experimental results demonstrate that the proposed method is able to provide more accurate reconstruction than existing Bayesian methods at a satisfactory speed. Additionally, the proposed method can not only be used for HKCS, it also has the potential to be extended to other multi-dimensional CS applications and to multi-dimensional block-sparse-based data recovery.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiuxiu Bai, Lele Ye, Jihua Zhu, Li Zhu, Taku Komura
{"title":"Skeleton Filter: A Self-Symmetric Filter for Skeletonization in Noisy Text Images.","authors":"Xiuxiu Bai, Lele Ye, Jihua Zhu, Li Zhu, Taku Komura","doi":"10.1109/TIP.2019.2944560","DOIUrl":"10.1109/TIP.2019.2944560","url":null,"abstract":"<p><p>Robustly computing the skeletons of objects in natural images is difficult due to the large variations in shape boundaries and the large amount of noise in the images. Inspired by recent findings in neuroscience, we propose the Skeleton Filter, which is a novel model for skeleton extraction from natural images. The Skeleton Filter consists of a pair of oppositely oriented Gabor-like filters; by applying the Skeleton Filter in various orientations to an image at multiple resolutions and fusing the results, our system can robustly extract the skeleton even under highly noisy conditions. We evaluate the performance of our approach using challenging noisy text datasets and demonstrate that our pipeline realizes state-of-the-art performance for extracting the text skeleton. Moreover, the presence of Gabor filters in the human visual system and the simple architecture of the Skeleton Filter can help explain the strong capabilities of humans in perceiving skeletons of objects, even under dramatically noisy conditions.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Coupled ISTA Network for Multi-modal Image Super-Resolution.","authors":"Xin Deng, Pier Luigi Dragotti","doi":"10.1109/TIP.2019.2944270","DOIUrl":"10.1109/TIP.2019.2944270","url":null,"abstract":"<p><p>Given a low-resolution (LR) image, multi-modal image super-resolution (MISR) aims to find the high-resolution (HR) version of this image with the guidance of an HR image from another modality. In this paper, we use a model-based approach to design a new deep network architecture for MISR. We first introduce a novel joint multi-modal dictionary learning (JMDL) algorithm to model cross-modality dependency. In JMDL, we simultaneously learn three dictionaries and two transform matrices to combine the modalities. Then, by unfolding the iterative shrinkage and thresholding algorithm (ISTA), we turn the JMDL model into a deep neural network, called deep coupled ISTA network. Since the network initialization plays an important role in deep network training, we further propose a layer-wise optimization algorithm (LOA) to initialize the parameters of the network before running back-propagation strategy. Specifically, we model the network initialization as a multi-layer dictionary learning problem, and solve it through convex optimization. The proposed LOA is demonstrated to effectively decrease the training loss and increase the reconstruction accuracy. Finally, we compare our method with other state-of-the-art methods in the MISR task. The numerical results show that our method consistently outperforms others both quantitatively and qualitatively at different upscaling factors for various multi-modal scenarios.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62589478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}