Yunhang Shen, Rongrong Ji, Kuiyuan Yang, Cheng Deng, Changhu Wang
{"title":"Category-Aware Spatial Constraint for Weakly Supervised Detection.","authors":"Yunhang Shen, Rongrong Ji, Kuiyuan Yang, Cheng Deng, Changhu Wang","doi":"10.1109/TIP.2019.2933735","DOIUrl":"10.1109/TIP.2019.2933735","url":null,"abstract":"<p><p>Weakly supervised object detection has attracted increasing research attention recently. To this end, most existing schemes rely on scoring category-independent region proposals, which is formulated as a multiple instance learning problem. During this process, the proposal scores are aggregated and supervised by only image-level labels, which often fails to locate object boundaries precisely. In this paper, we break through such a restriction by taking a deeper look into the score aggregation stage and propose a Category-aware Spatial Constraint (CSC) scheme for proposals, which is integrated into weakly supervised object detection in an end-to-end learning manner. In particular, we incorporate the global shape information of objects as an unsupervised constraint, which is inferred from build-in foreground-and-background cues, termed Category-specific Pixel Gradient (CPG) maps. Specifically, each region proposal is weighted according to how well it covers the estimated shape of objects. For each category, a multi-center regularization is further introduced to penalize the violations between centers cluster and high-score proposals in a given image. Extensive experiments are done on the most widely-used benchmark Pascal VOC and COCO, which shows that our approach significantly improves weakly supervised object detection without adding new learnable parameters to the existing models nor changing the structures of CNNs.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Key-point Detector based on Sparse Coding.","authors":"Thanh Hong-Phuoc, Ling Guan","doi":"10.1109/TIP.2019.2934891","DOIUrl":"10.1109/TIP.2019.2934891","url":null,"abstract":"<p><p>Most popular hand-crafted key-point detectors such as Harris corner, MSER, SIFT, SURF rely on some specific pre-designed structures for detection of corners, blobs, or junctions in an image. The very nature of pre-designed structures can be considered a source of inflexibility for these detectors in different contexts. Additionally, the performance of these detectors is also highly affected by non-uniform change in illumination. To the best of our knowledge, while there are some previous works addressing one of the two aforementioned problems, there currently lacks an efficient method to solve both simultaneously. In this paper, we propose a novel Sparse Coding based Key-point detector (SCK) which is fully invariant to affine intensity change and independent of any particular structure. The proposed detector locates a key-point in an image, based on a complexity measure calculated from the block surrounding its position. A strength measure is also proposed for comparing and selecting the detected key-points when the maximum number of key-points is limited. In this paper, the desirable characteristics of the proposed detector are theoretically confirmed. Experimental results on three public datasets also show that the proposed detector achieves significantly high performance in terms of repeatability and matching score.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62585479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jichang Li, Si Wu, Cheng Liu, Zhiwen Yu, Hau-San Wong
{"title":"Semi-Supervised Deep Coupled Ensemble Learning with Classification Landmark Exploration.","authors":"Jichang Li, Si Wu, Cheng Liu, Zhiwen Yu, Hau-San Wong","doi":"10.1109/TIP.2019.2933724","DOIUrl":"10.1109/TIP.2019.2933724","url":null,"abstract":"<p><p>Using an ensemble of neural networks with consistency regularization is effective for improving performance and stability of deep learning, compared to the case of a single network. In this paper, we present a semi-supervised Deep Coupled Ensemble (DCE) model, which contributes to ensemble learning and classification landmark exploration for better locating the final decision boundaries in the learnt latent space. First, multiple complementary consistency regularizations are integrated into our DCE model to enable the ensemble members to learn from each other and themselves, such that training experience from different sources can be shared and utilized during training. Second, in view of the possibility of producing incorrect predictions on a number of difficult instances, we adopt class-wise mean feature matching to explore important unlabeled instances as classification landmarks, on which the model predictions are more reliable. Minimizing the weighted conditional entropy on unlabeled data is able to force the final decision boundaries to move away from important training data points, which facilitates semi-supervised learning. Ensemble members could eventually have similar performance due to consistency regularization, and thus only one of these members is needed during the test stage, such that the efficiency of our model is the same as the non-ensemble case. Extensive experimental results demonstrate the superiority of our proposed DCE model over existing state-of-the-art semi-supervised learning methods.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huanhua Liu, Yun Zhang, Huan Zhang, Chunling Fan, Sam Kwong, C-C Jay Kuo, Xiaoping Fan
{"title":"Deep Learning based Picture-Wise Just Noticeable Distortion Prediction Model for Image Compression.","authors":"Huanhua Liu, Yun Zhang, Huan Zhang, Chunling Fan, Sam Kwong, C-C Jay Kuo, Xiaoping Fan","doi":"10.1109/TIP.2019.2933743","DOIUrl":"10.1109/TIP.2019.2933743","url":null,"abstract":"<p><p>Picture Wise Just Noticeable Difference (PW-JND), which accounts for the minimum difference of a picture that human visual system can perceive, can be widely used in perception-oriented image and video processing. However, the conventional Just Noticeable Difference (JND) models calculate the JND threshold for each pixel or sub-band separately, which may not reflect the total masking effect of a picture accurately. In this paper, we propose a deep learning based PW-JND prediction model for image compression. Firstly, we formulate the task of predicting PW-JND as a multi-class classification problem, and propose a framework to transform the multi-class classification problem to a binary classification problem solved by just one binary classifier. Secondly, we construct a deep learning based binary classifier named perceptually lossy/lossless predictor which can predict whether an image is perceptually lossy to another or not. Finally, we propose a sliding window based search strategy to predict PW-JND based on the prediction results of the perceptually lossy/lossless predictor. Experimental results show that the mean accuracy of the perceptually lossy/lossless predictor reaches 92%, and the absolute prediction error of the proposed PW-JND model is 0.79 dB on average, which shows the superiority of the proposed PW-JND model to the conventional JND models.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian Z Bentz, Dergan Lin, Justin A Patel, Kevin J Webb
{"title":"Multiresolution Localization with Temporal Scanning for Super-Resolution Diffuse Optical Imaging of Fluorescence.","authors":"Brian Z Bentz, Dergan Lin, Justin A Patel, Kevin J Webb","doi":"10.1109/TIP.2019.2931080","DOIUrl":"10.1109/TIP.2019.2931080","url":null,"abstract":"<p><p>A super-resolution optical imaging method is presented that relies on the distinct temporal information associated with each fluorescent optical reporter to determine its spatial position to high precision with measurements of heavily scattered light. This multiple-emitter localization approach uses a diffusion equation forward model in a cost function, and has the potential to achieve micron-scale spatial resolution through centimeters of tissue. Utilizing some degree of temporal separation for the reporter emissions, position and emission strength are determined using a computationally efficient time stripping multiresolution algorithm. The approach circumvents the spatial resolution challenges faced by earlier optical imaging approaches using a diffusion equation forward model, and is promising for in vivo applications. For example, in principle, the method could be used to localize individual neurons firing throughout a rodent brain, enabling direct imaging of neural network activity.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7012689/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Summit Navigator: A Novel Approach for Local Maxima Extraction.","authors":"Tran Hiep Dinh, Manh Duong Phung, Quang Phuc Ha","doi":"10.1109/TIP.2019.2932501","DOIUrl":"10.1109/TIP.2019.2932501","url":null,"abstract":"<p><p>This paper presents a novel method, called the Summit Navigator, to effectively extract local maxima of an image histogram for multi-object segmentation of images. After smoothing with a moving average filter, the obtained histogram is analyzed, based on the data density and distribution to find the best observing location. An observability index for each initial peak is proposed to evaluate if it can be considered as dominant by using the calculated observing location. Recursive algorithms are then developed for peak searching and merging to remove any false detection of peaks that are located on one side of each mode. Experimental results demonstrated the advantages of the proposed approach in terms of accuracy and consistency in different reputable datasets.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiqing Min, Shuhuan Mei, Linhu Liu, Yi Wang, Shuqiang Jiang
{"title":"Multi-Task Deep Relative Attribute Learning for Visual Urban Perception.","authors":"Weiqing Min, Shuhuan Mei, Linhu Liu, Yi Wang, Shuqiang Jiang","doi":"10.1109/TIP.2019.2932502","DOIUrl":"10.1109/TIP.2019.2932502","url":null,"abstract":"<p><p>Visual urban perception aims to quantify perceptual attributes (e.g., safe and depressing attributes) of physical urban environment from crowd-sourced street-view images and their pairwise comparisons. It has been receiving more and more attention in computer vision for various applications, such as perceptive attribute learning and urban scene understanding. Most existing methods adopt either (i) a regression model trained using image features and ranked scores converted from pairwise comparisons for perceptual attribute prediction or (ii) a pairwise ranking algorithm to independently learn each perceptual attribute. However, the former fails to directly exploit pairwise comparisons while the latter ignores the relationship among different attributes. To address them, we propose a Multi-Task Deep Relative Attribute Learning Network (MTDRALN) to learn all the relative attributes simultaneously via multi-task Siamese networks, where each Siamese network will predict one relative attribute. Combined with deep relative attribute learning, we utilize the structured sparsity to exploit the prior from natural attribute grouping, where all the attributes are divided into different groups based on semantic relatedness in advance. As a result, MTDRALN is capable of learning all the perceptual attributes simultaneously via multi-task learning. Besides the ranking sub-network, MTDRALN further introduces the classification sub-network, and these two types of losses from two sub-networks jointly constrain parameters of the deep network to make the network learn more discriminative visual features for relative attribute learning. In addition, our network can be trained in an end-to-end way to make deep feature learning and multi-task relative attribute learning reinforce each other. Extensive experiments on the large-scale Place Pulse 2.0 dataset validate the advantage of our proposed network. Our qualitative results along with visualization of saliency maps also show that the proposed network is able to learn effective features for perceptual attributes.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Morphology-based Noise Reduction: Structural Variation and Thresholding in the Bitonic Filter.","authors":"Graham Treece","doi":"10.1109/TIP.2019.2932572","DOIUrl":"10.1109/TIP.2019.2932572","url":null,"abstract":"<p><p>The bitonic filter was recently developed to embody the novel concept of signal bitonicity (one local extremum within a set range) to differentiate from noise, by use of data ranking and linear operators. For processing images, the spatial extent was locally constrained to a fixed circular mask. Since structure in natural images varies, a novel structurally varying bitonic filter is presented, which locally adapts the mask, without following patterns in the noise. This new filter includes novel robust structurally varying morphological operations, with efficient implementations, and a novel formulation of non-iterative directional Gaussian filtering. Data thresholds are also integrated with the morphological operations, increasing noise reduction for low noise, and enabling a multi-resolution framework for high noise levels. The structurally varying bitonic filter is presented without presuming prior knowledge of morphological filtering, and compared to high-performance linear noise-reduction filters, to set this novel concept in context. These are tested over a wide range of noise levels, on a fairly broad set of images. The new filter is a considerable improvement on the fixed-mask bitonic, outperforms anisotropic diffusion and image-guided filtering in all but extremely low noise, non-local means at all noise levels, but not the block-matching 3D filter, though results are promising for very high noise. The structurally varying bitonic tends to have less characteristic residual noise in regions of smooth signal, and very good preservation of signal edges, though with some loss of small scale detail when compared to the block-matching 3D filter. The efficient implementation means that processing time, though slower than the fixed-mask bitonic filter, remains competitive.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62584099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yun Zhang, Huan Zhang, Mei Yu, Sam Kwong, Yo-Sung Ho
{"title":"Sparse Representation based Video Quality Assessment for Synthesized 3D Videos.","authors":"Yun Zhang, Huan Zhang, Mei Yu, Sam Kwong, Yo-Sung Ho","doi":"10.1109/TIP.2019.2929433","DOIUrl":"10.1109/TIP.2019.2929433","url":null,"abstract":"<p><p>The temporal flicker distortion is one of the most annoying noises in synthesized virtual view videos when they are rendered by compressed multi-view video plus depth in Three Dimensional (3D) video system. To assess the synthesized view video quality and further optimize the compression techniques in 3D video system, objective video quality assessment which can accurately measure the flicker distortion is highly needed. In this paper, we propose a full reference sparse representation based video quality assessment method towards synthesized 3D videos. Firstly, a synthesized video, treated as a 3D volume data with spatial (X-Y) and temporal (T) domains, is reformed and decomposed as a number of spatially neighboring temporal layers, i.e., X-T or Y-T planes. Gradient features in temporal layers of the synthesized video and strong edges of depth maps are used as key features in detecting the location of flicker distortions. Secondly, dictionary learning and sparse representation for the temporal layers are then derived and applied to effectively represent the temporal flicker distortion. Thirdly, a rank pooling method is used to pool all the temporal layer scores and obtain the score for the flicker distortion. Finally, the temporal flicker distortion measurement is combined with the conventional spatial distortion measurement to assess the quality of synthesized 3D videos. Experimental results on synthesized video quality database demonstrate our proposed method is significantly superior to other state-of-the-art methods, especially on the view synthesis distortions induced from depth videos.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced Fuzzy-based Local Information Algorithm for Sonar Image Segmentation.","authors":"Avi Abu, Roee Diamant","doi":"10.1109/TIP.2019.2930148","DOIUrl":"10.1109/TIP.2019.2930148","url":null,"abstract":"<p><p>The recent boost in undersea operations has led to the development of high-resolution sonar systems mounted on autonomous vehicles. These vehicles are used to scan the seafloor in search of different objects such as sunken ships, archaeological sites, and submerged mines. An important part of the detection operation is the segmentation of sonar images, where the object's highlight and shadow are distinguished from the seabed background. In this work, we focus on the automatic segmentation of sonar images. We present our enhanced fuzzybased with Kernel metric (EnFK) algorithm for the segmentation of sonar images which, in an attempt to improve segmentation accuracy, introduces two new fuzzy terms of local spatial and statistical information. Our algorithm includes a preliminary de-noising algorithm which, together with the original image, feeds into the segmentation procedure to avoid trapping to local minima and to improve convergence. The result is a segmentation procedure that specifically suits the intensity inhomogeneity and the complex seabed texture of sonar images. We tested our approach using simulated images, real sonar images, and sonar images that we created in two different sea experiments, using multibeam sonar and synthetic aperture sonar. The results show accurate segmentation performance that is far beyond the stateof-the-art results.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}