{"title":"Tone Mapping Beyond the Classical Receptive Field","authors":"Shaobing Gao, Min Tan, Zhen He, Yongjie Li","doi":"10.1109/TIP.2020.2970541","DOIUrl":"https://doi.org/10.1109/TIP.2020.2970541","url":null,"abstract":"Some neurons in the primary visual cortex (V1) of human visual system (HVS) conduct dynamic center-surround computation, which is thought to contribute to compress the high dynamic range (HDR) scene and preserve the details. We simulate this dynamic receptive field (RF) property of V1 neurons to solve the so-called tone mapping (TM) task in this paper. The novelties of our method are as follows. (1) Cortical processing mechanisms of HVS are modeled to build a local TM operation based on two Gaussian functions whose kernels and weights adapt according to the center-surround contrast, thus reducing halo artifacts and effectively enhancing the local details of bright and dark parts of image. (2) Our method uses an adaptive filter that follows the contrast levels of the image, which is computationally very efficient. (3) The local fusion between the center and surround responses returned by a cortical processing flow and the global signals returned by a sub-cortical processing flow according to the local contrast forms a dynamic mechanism that selectively enhances the details. Extensive experiments show that the proposed method can efficiently render the HDR scenes with good contrast, clear details, and high structural fidelity. In addition, the proposed method can also obtain promising performance when applied to enhance the low-light images. Furthermore, by modeling these biological solutions, our technique is simple and robust considering that our results were obtained using the same parameters for all the datasets (e.g., HDR images or low-light images), that is, mimicking how HVS operates.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"4174-4187"},"PeriodicalIF":10.6,"publicationDate":"2020-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2020.2970541","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48616807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hae-Gon Jeon, Jaeheung Surh, Sunghoon Im, I. Kweon
{"title":"Ring Difference Filter for Fast and Noise Robust Depth From Focus","authors":"Hae-Gon Jeon, Jaeheung Surh, Sunghoon Im, I. Kweon","doi":"10.1109/TIP.2019.2937064","DOIUrl":"https://doi.org/10.1109/TIP.2019.2937064","url":null,"abstract":"Depth from focus (DfF) is a method of estimating the depth of a scene by using information acquired through changes in the focus of a camera. Within the DfF framework of, the focus measure (FM) forms the foundation which determines the accuracy of the output. With the results from the FM, the role of a DfF pipeline is to determine and recalculate unreliable measurements while enhancing those that are reliable. In this paper, we propose a new FM, which we call the “ring difference filter” (RDF), that can more accurately and robustly measure focus. FMs can usually be categorized as confident local methods or noise robust non-local methods. The RDF’s unique ring-and-disk structure allows it to have the advantages of both local and non-local FMs. We then describe an efficient pipeline that utilizes the RDF’s properties. Part of this pipeline is our proposed RDF-based cost aggregation method, which is able to robustly refine the initial results in the presence of image noise. Our method is able to reproduce results that are on par with or even better than those of state-of-the-art methods, while spending less time in computation.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"1045-1060"},"PeriodicalIF":10.6,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2019.2937064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford–Shah Model","authors":"Marion Foare, N. Pustelnik, Laurent Condat","doi":"10.1109/TIP.2019.2944561","DOIUrl":"https://doi.org/10.1109/TIP.2019.2944561","url":null,"abstract":"The Mumford–Shah model is a standard model in image segmentation, and due to its difficulty, many approximations have been proposed. The major interest of this functional is to enable joint image restoration and contour detection. In this work, we propose a general formulation of the discrete counterpart of the Mumford–Shah functional, adapted to nonsmooth penalizations, fitting the assumptions required by the Proximal Alternating Linearized Minimization (PALM), with convergence guarantees. A second contribution aims to relax some assumptions on the involved functionals and derive a novel Semi-Linearized Proximal Alternated Minimization (SL-PAM) algorithm, with proved convergence. We compare the performances of the algorithm with several nonsmooth penalizations, for Gaussian and Poisson denoising, image restoration and RGB-color denoising. We compare the results with state-of-the-art convex relaxations of the Mumford–Shah functional, and a discrete version of the Ambrosio–Tortorelli functional. We show that the SL-PAM algorithm is faster than the original PALM algorithm, and leads to competitive denoising, restoration and segmentation results.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"2176-2189"},"PeriodicalIF":10.6,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2019.2944561","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Muntasir Rahman, Yanhao Tan, Jian Xue, K. Lu
{"title":"Recent Advances in 3D Object Detection in the Era of Deep Neural Networks: A Survey","authors":"Mohammad Muntasir Rahman, Yanhao Tan, Jian Xue, K. Lu","doi":"10.1109/TIP.2019.2955239","DOIUrl":"https://doi.org/10.1109/TIP.2019.2955239","url":null,"abstract":"With the rapid development of deep learning technology and other powerful tools, 3D object detection has made great progress and become one of the fastest growing field in computer vision. Many automated applications such as robotic navigation, autonomous driving, and virtual or augmented reality system require estimation of accurate 3D object location and detection. Under this requirement, many methods have been proposed to improve the performance of 3D object localization and detection. Despite recent efforts, 3D object detection is still a very challenging task due to occlusion, viewpoint variations, scale changes, and limited information in 3D scenes. In this paper, we present a comprehensive review of recent state-of-the-art approaches in 3D object detection technology. We start with some basic concepts, then describe some of the available datasets that are designed to facilitate the performance evaluation of 3D object detection algorithms. Next, we will review the state-of-the-art technologies in this area, highlighting their contributions, importance, and limitations as a guide for future research. Finally, we provide a quantitative comparison of the results of the state-of-the-art methods on the popular public datasets.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"2947-2962"},"PeriodicalIF":10.6,"publicationDate":"2019-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2019.2955239","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41748035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Sun, Wei Shao, Mingliang Wang, Daoqiang Zhang, Mingxia Liu
{"title":"High-Order Feature Learning for Multi-Atlas Based Label Fusion: Application to Brain Segmentation With MRI","authors":"Liang Sun, Wei Shao, Mingliang Wang, Daoqiang Zhang, Mingxia Liu","doi":"10.1109/TIP.2019.2952079","DOIUrl":"https://doi.org/10.1109/TIP.2019.2952079","url":null,"abstract":"Multi-atlas based segmentation methods have shown their effectiveness in brain regions-of-interesting (ROIs) segmentation, by propagating labels from multiple atlases to a target image based on the similarity between patches in the target image and multiple atlas images. Most of the existing multi-atlas based methods use image intensity features to calculate the similarity between a pair of image patches for label fusion. In particular, using only low-level image intensity features cannot adequately characterize the complex appearance patterns (e.g., the high-order relationship between voxels within a patch) of brain magnetic resonance (MR) images. To address this issue, this paper develops a high-order feature learning framework for multi-atlas based label fusion, where high-order features of image patches are extracted and fused for segmenting ROIs of structural brain MR images. Specifically, an unsupervised feature learning method (i.e., means-covariances restricted Boltzmann machine, mcRBM) is employed to learn high-order features (i.e., mean and covariance features) of patches in brain MR images. Then, a group-fused sparsity dictionary learning method is proposed to jointly calculate the voting weights for label fusion, based on the learned high-order and the original image intensity features. The proposed method is compared with several state-of-the-art label fusion methods on ADNI, NIREP and LONI-LPBA40 datasets. The Dice ratio achieved by our method is 88.30%, 88.83%, 79.54% and 81.02% on left and right hippocampus on the ADNI, NIREP and LONI-LPBA40 datasets, respectively, while the best Dice ratio yielded by the other methods are 86.51%, 87.39%, 78.48% and 79.65% on three datasets, respectively.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"2702-2713"},"PeriodicalIF":10.6,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2019.2952079","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62591284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Repeated Look-up Tables.","authors":"Erik Reinhard, Elena Garces, Jurgen Stauder","doi":"10.1109/TIP.2019.2949245","DOIUrl":"10.1109/TIP.2019.2949245","url":null,"abstract":"<p><p>Efficient hardware implementations routinely approximate mathematical functions with look-up tables, while keeping the error of the approximation under control. For a certain class of commonly occurring 1D functions, namely monotonically increasing or decreasing functions, we found that it is possible to approximate such functions by repeated application of a very low resolution 1D look-up table. There are many advantages to cascading multiple identical LUTs, including the promise of a very simple hardware design and the use of standard linear interpolation. Further, the complexity associated with unequal bin sizes can be avoided. We show that for realistic applications, including gamma correction, high dynamic range encoding and decoding curves, as well as tone mapping and inverse tone mapping applications, multiple cascaded look-up tables can reduce the approximation error by more than 50% compared to a single look-up table with the same total memory footprint.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Dimensional Quaternion Sparse Discriminant Analysis.","authors":"Xiaolin Xiao, Yongyong Chen, Yue-Jiao Gong, Yicong Zhou","doi":"10.1109/TIP.2019.2947775","DOIUrl":"10.1109/TIP.2019.2947775","url":null,"abstract":"<p><p>Linear discriminant analysis has been incorporated with various representations and measurements for dimension reduction and feature extraction. In this paper, we propose two-dimensional quaternion sparse discriminant analysis (2D-QSDA) that meets the requirements of representing RGB and RGB-D images. 2D-QSDA advances in three aspects: 1) including sparse regularization, 2D-QSDA relies only on the important variables, and thus shows good generalization ability to the out-of-sample data which are unseen during the training phase; 2) benefited from quaternion representation, 2D-QSDA well preserves the high order correlation among different image channels and provides a unified approach to extract features from RGB and RGB-D images; 3) the spatial structure of the input images is retained via the matrix-based processing. We tackle the constrained trace ratio problem of 2D-QSDA by solving a corresponding constrained trace difference problem, which is then transformed into a quaternion sparse regression (QSR) model. Afterward, we reformulate the QSR model to an equivalent complex form to avoid the processing of the complicated structure of quaternions. A nested iterative algorithm is designed to learn the solution of 2D-QSDA in the complex space and then we convert this solution back to the quaternion domain. To improve the separability of 2D-QSDA, we further propose 2D-QSDAw using the weighted pairwise between-class distances. Extensive experiments on RGB and RGB-D databases demonstrate the effectiveness of 2D-QSDA and 2D-QSDAw compared with peer competitors.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BMAN: Bidirectional Multi-scale Aggregation Networks for Abnormal Event Detection.","authors":"Sangmin Lee, Hak Gu Kim, Yong Man Ro","doi":"10.1109/TIP.2019.2948286","DOIUrl":"10.1109/TIP.2019.2948286","url":null,"abstract":"<p><p>Abnormal event detection is an important task in video surveillance systems. In this paper, we propose a novel bidirectional multi-scale aggregation networks (BMAN) for abnormal event detection. The proposed BMAN learns spatiotemporal patterns of normal events to detect deviations from the learned normal patterns as abnormalities. The BMAN consists of two main parts: an inter-frame predictor and an appearancemotion joint detector. The inter-frame predictor is devised to encode normal patterns, which generates an inter-frame using bidirectional multi-scale aggregation based on attention. With the feature aggregation, robustness for object scale variations and complex motions is achieved in normal pattern encoding. Based on the encoded normal patterns, abnormal events are detected by the appearance-motion joint detector in which both appearance and motion characteristics of scenes are considered. Comprehensive experiments are performed, and the results show that the proposed method outperforms the existing state-of-the-art methods. The resulting abnormal event detection is interpretable on the visual basis of where the detected events occur. Further, we validate the effectiveness of the proposed network designs by conducting ablation study and feature visualization.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ion Martinikorena, Andoni Larumbe-Bergera, Mikel Ariz, Sonia Porta, Rafael Cabeza, Arantxa Villanueva
{"title":"Low cost gaze estimation: knowledge-based solutions.","authors":"Ion Martinikorena, Andoni Larumbe-Bergera, Mikel Ariz, Sonia Porta, Rafael Cabeza, Arantxa Villanueva","doi":"10.1109/TIP.2019.2946452","DOIUrl":"10.1109/TIP.2019.2946452","url":null,"abstract":"<p><p>Eye tracking technology in low resolution scenarios is not a completely solved issue to date. The possibility of using eye tracking in a mobile gadget is a challenging objective that would permit to spread this technology to non-explored fields. In this paper, a knowledge based approach is presented to solve gaze estimation in low resolution settings. The understanding of the high resolution paradigm permits to propose alternative models to solve gaze estimation. In this manner, three models are presented: a geometrical model, an interpolation model and a compound model, as solutions for gaze estimation for remote low resolution systems. Since this work considers head position essential to improve gaze accuracy, a method for head pose estimation is also proposed. The methods are validated in an optimal framework, I2Head database, which combines head and gaze data. The experimental validation of the models demonstrates their sensitivity to image processing inaccuracies, critical in the case of the geometrical model. Static and extreme movement scenarios are analyzed showing the higher robustness of compound and geometrical models in the presence of user's displacement. Accuracy values of about 3° have been obtained, increasing to values close to 5° in extreme displacement settings, results fully comparable with the state-of-the-art.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Motion-Assisted Tensor Completion Method for Background Initialization in Complex Video Sequences.","authors":"Ibrahim Kajo, Nidal Kamel, Yassine Ruichek","doi":"10.1109/TIP.2019.2946098","DOIUrl":"10.1109/TIP.2019.2946098","url":null,"abstract":"<p><p>The background Initialization (BI) problem has attracted the attention of researchers in different image/video processing fields. Recently, a tensor-based technique called spatiotemporal slice-based singular value decomposition (SS-SVD) has been proposed for background initialization. SS-SVD applies the SVD on the tensor slices and estimates the background from low-rank information. Despite its efficiency in background initialization, the performance of SS-SVD requires further improvement in the case of complex sequences with challenges such as stationary foreground objects (SFOs), illumination changes, low frame-rate, and clutter. In this paper, a self-motion-assisted tensor completion method is proposed to overcome the limitations of SS-SVD in complex video sequences and enhance the visual appearance of the initialized background. With the proposed method, the motion information, extracted from the sparse portion of the tensor slices, is incorporated with the low-rank information of SS-SVD to eliminate existing artifacts in the initiated background. Efficient blending schemes between the low-rank (background) and sparse (foreground) information of the tensor slices is developed for scenarios such as SFO removal, lighting variation processing, low frame-rate processing, crowdedness estimation, and best frame selection. The performance of the proposed method on video sequences with complex scenarios is compared with the top-ranked state-of-the-art techniques in the field of background initialization. The results not only validate the improved performance over the majority of the tested challenges but also demonstrate the capability of the proposed method to initialize the background in less computational time.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62590002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}