Youfa Liu, Weiping Tu, Bo Du, Lefei Zhang, Dacheng Tao
{"title":"Homologous Component Analysis for Domain Adaptation.","authors":"Youfa Liu, Weiping Tu, Bo Du, Lefei Zhang, Dacheng Tao","doi":"10.1109/TIP.2019.2929421","DOIUrl":"10.1109/TIP.2019.2929421","url":null,"abstract":"<p><p>Covariate shift assumption based domain adaptation approaches usually utilize only one common transformation to align marginal distributions and make conditional distributions preserved. However, one common transformation may cause loss of useful information, such as variances and neighborhood relationship in both source and target domain. To address this problem, we propose a novel method called homologous component analysis (HCA) where we try to find two totally different but homologous transformations to align distributions with side information and make conditional distributions preserved. As it is hard to find a closed form solution to the corresponding optimization problem, we solve them by means of the alternating direction minimizing method (ADMM) in the context of Stiefel manifolds. We also provide a generalization error bound for domain adaptation in semi-supervised case and two transformations can help to decrease this upper bound more than only one common transformation does. Extensive experiments on synthetic and real data show the effectiveness of the proposed method by comparing its classification accuracy with the state-of-the-art methods and numerical evidence on chordal distance and Frobenius distance shows that resulting optimal transformations are different.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unambiguous Scene Text Segmentation with Referring Expression Comprehension.","authors":"Xuejian Rong, Chucai Yi, Yingli Tian","doi":"10.1109/TIP.2019.2930176","DOIUrl":"10.1109/TIP.2019.2930176","url":null,"abstract":"<p><p>Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich, precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e. scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hyperspectral Image Denoising via Matrix Factorization and Deep Prior Regularization.","authors":"Baihong Lin, Xiaoming Tao, Jianhua Lu","doi":"10.1109/TIP.2019.2928627","DOIUrl":"10.1109/TIP.2019.2928627","url":null,"abstract":"<p><p>Deep learning has been successfully introduced for 2D-image denoising, but it is still unsatisfactory for hyperspectral image (HSI) denosing due to the unacceptable computational complexity of the end-to-end training process and the difficulty of building a universal 3D-image training dataset. In this paper, instead of developing an end-to-end deep learning denoising network, we propose a hyperspectral image denoising framework for the removal of mixed Gaussian impulse noise, in which the denoising problem is modeled as a convolutional neural network (CNN) constrained non-negative matrix factorization problem. Using the proximal alternating linearized minimization, the optimization can be divided into three steps: the update of the spectral matrix, the update of the abundance matrix and the estimation of the sparse noise. Then, we design the CNN architecture and proposed two training schemes, which can allow the CNN to be trained with a 2D-image dataset. Compared with the state-of-the-art denoising methods, the proposed method has relatively good performance on the removal of the Gaussian and mixed Gaussian impulse noises. More importantly, the proposed model can be only trained once by a 2D-image dataset, but can be used to denoise HSIs with different numbers of channel bands.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhonggui Sun, Bo Han, Jie Li, Jin Zhang, Xinbo Gao
{"title":"Weighted Guided Image Filtering with Steering Kernel.","authors":"Zhonggui Sun, Bo Han, Jie Li, Jin Zhang, Xinbo Gao","doi":"10.1109/TIP.2019.2928631","DOIUrl":"10.1109/TIP.2019.2928631","url":null,"abstract":"<p><p>Due to its local property, guided image filter (GIF) generally suffers from halo artifacts near edges. To make up for the deficiency, a weighted guided image filter (WGIF) was proposed recently by incorporating an edge-aware weighting into the filtering process. It takes the advantages of local and global operations, and achieves better performance in edge-preserving. However, edge direction, a vital property of the guidance image, is not considered fully in these guided filters. In order to overcome the drawback, we propose a novel version of GIF, which can leverage the edge direction more sufficiently. In particular, we utilize the steering kernel to adaptively learn the direction and incorporate the learning results into the filtering process to improve the filter's behavior. Theoretical analysis shows that the proposed method can get more powerful performance with preserving edges and reducing halo artifacts effectively. Similar conclusions are also reached through the thorough experiments including edge-aware smoothing, detail enhancement, denoising and dehazing.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lian Zhou, Yuejie Zhang, Yugang Jiang, Tao Zhang, Weiguo Fan
{"title":"Re-Caption: Saliency-Enhanced Image Captioning through Two-Phase Learning.","authors":"Lian Zhou, Yuejie Zhang, Yugang Jiang, Tao Zhang, Weiguo Fan","doi":"10.1109/TIP.2019.2928144","DOIUrl":"10.1109/TIP.2019.2928144","url":null,"abstract":"<p><p>Visual and semantic saliency are important in image captioning. However, single-phase image captioning benefits little from limited saliency without a saliency predictor. In this paper, a novel saliency-enhanced re-captioning framework via two-phase learning is proposed to enhance the single-phase image captioning. In the framework, visual saliency and semantic saliency are distilled from the first-phase model and fused with the second-phase model for model self-boosting. The visual saliency mechanism can generate a saliency map and a saliency mask for an image without learning a saliency map predictor. The semantic saliency mechanism sheds some lights on the properties of words with part-of-speech Noun in a caption. Besides, another type of saliency, sample saliency is proposed to explicitly compute the saliency degree of each sample, which helps for more robust image captioning. In addition, how to combine the above three types of saliency for further performance boost is also examined. Our framework can treat an image captioning model as a saliency extractor, which may benefit other captioning models and related tasks. The experimental results on both the Flickr30k and MSCOCO datasets show that the saliency-enhanced models can obtain promising performance gains.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification.","authors":"Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie","doi":"10.1109/TIP.2019.2928126","DOIUrl":"10.1109/TIP.2019.2928126","url":null,"abstract":"<p><p>Traditional person re-identification (re-id) methods perform poorly under changing illuminations. This situation can be addressed by using dual-cameras that capture visible images in a bright environment and infrared images in a dark environment. Yet, this scheme needs to solve the visible-infrared matching issue, which is largely under-studied. Matching pedestrians across heterogeneous modalities is extremely challenging because of different visual characteristics. In this paper, we propose a novel framework that employ modality-specific networks to tackle with the heterogeneous matching problem. The proposed framework utilizes the modality-related information and extracts modality-specific representations (MSR) by constructing an individual network for each modality. In addition, a cross-modality Euclidean constraint is introduced to narrow the gap between different networks. We also integrate the modality-shared layers into modality-specific networks to extract shareable information and use a modality-shared identity loss to facilitate the extraction of modality-invariant features. Then a modality-specific discriminant metric is learned for each domain to strengthen the discriminative power of MSR. Eventually, we use a view classifier to learn view information. The experiments demonstrate that the MSR effectively improves the performance of deep networks on VI-REID and remarkably outperforms the state-of-the-art methods.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inpainting vs denoising for dose reduction in scanning-beam microscopies.","authors":"Toby Sanders, Christian Dwyer","doi":"10.1109/TIP.2019.2928133","DOIUrl":"10.1109/TIP.2019.2928133","url":null,"abstract":"<p><p>We consider sampling strategies for reducing the radiation dose during image acquisition in scanning-beam microscopies, such as SEM, STEM, and STXM. Our basic assumption is that we may acquire subsampled image data (with some pixels missing) and then inpaint the missing data using a compressed-sensing approach. Our noise model consists of Poisson noise plus random Gaussian noise. We include the possibility of acquiring fully-sampled image data, in which case the inpainting approach reduces to a denoising procedure. We use numerical simulations to compare the accuracy of reconstructed images with the \"ground truths.\" The results generally indicate that, for sufficiently high radiation doses, higher sampling rates achieve greater accuracy, commensurate with well-established literature. However, for very low radiation doses, where the Poisson noise and/or random Gaussian noise begins to dominate, then our results indicate that subsampling/inpainting can result in smaller reconstruction errors. We also present an information-theoretic analysis, which allows us to quantify the amount of information gained through the different sampling strategies and enables some broader discussion of the main results.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Path-Based Dictionary Augmentation: A Framework for Improving k-Sparse Image Processing.","authors":"Tegan H Emerson, Colin Olson, Timothy Doster","doi":"10.1109/TIP.2019.2927331","DOIUrl":"10.1109/TIP.2019.2927331","url":null,"abstract":"<p><p>We have previously shown that augmenting orthogonal matching pursuit (OMP) with an additional step in the identification stage of each pursuit iteration yields improved k-sparse reconstruction and denoising performance relative to baseline OMP. At each iteration a \"path,\" or geodesic, is generated between the two dictionary atoms that are most correlated with the residual and from this path a new atom that has a greater correlation to the residual than either of the two bracketing atoms is selected. Here, we provide new computational results illustrating improvements in sparse coding and denoising on canonical datasets using both learned and structured dictionaries. Two methods of constructing a path are investigated for each dictionary type: the Euclidean geodesic formed by a linear combination of the two atoms and the 2-Wasserstein geodesic corresponding to the optimal transport map between the atoms. We prove here the existence of a higher-correlation atom in the Euclidean case under assumptions on the two bracketing atoms and introduce algorithmic modifications to improve the likelihood that the bracketing atoms meet those conditions. Although we demonstrate our augmentation on OMP alone, in general it may be applied to any reconstruction algorithm that relies on the selection and sorting of high-similarity atoms during an analysis or identification phase.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62583013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exemplar-Based Recursive Instance Segmentation With Application to Plant Image Analysis.","authors":"Jin-Gang Yu, Yansheng Li, Changxin Gao, Hongxia Gaoa, Gui-Song Xia, Zhu Liang Yub, Yuanqing Lic","doi":"10.1109/TIP.2019.2923571","DOIUrl":"10.1109/TIP.2019.2923571","url":null,"abstract":"<p><p>Instance segmentation is a challenging computer vision problem which lies at the intersection of object detection and semantic segmentation. Motivated by plant image analysis in the context of plant phenotyping, a recently emerging application field of computer vision, this paper presents the Exemplar-Based Recursive Instance Segmentation (ERIS) framework. A three-layer probabilistic model is firstly introduced to jointly represent hypotheses, voting elements, instance labels and their connections. Afterwards, a recursive optimization algorithm is developed to infer the maximum a posteriori (MAP) solution, which handles one instance at a time by alternating among the three steps of detection, segmentation and update. The proposed ERIS framework departs from previous works mainly in two respects. First, it is exemplar-based and model-free, which can achieve instance-level segmentation of a specific object class given only a handful of (typically less than 10) annotated exemplars. Such a merit enables its use in case that no massive manually-labeled data is available for training strong classification models, as required by most existing methods. Second, instead of attempting to infer the solution in a single shot, which suffers from extremely high computational complexity, our recursive optimization strategy allows for reasonably efficient MAP-inference in full hypothesis space. The ERIS framework is substantialized for the specific application of plant leaf segmentation in this work. Experiments are conducted on public benchmarks to demonstrate the superiority of our method in both effectiveness and efficiency in comparison with the state-of-the-art.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2019-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62582750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"End-to-End Single Image Fog Removal Using Enhanced Cycle Consistent Adversarial Networks","authors":"Wei Liu, Xianxu Hou, Jiang Duan, G. Qiu","doi":"10.1109/TIP.2020.3007844","DOIUrl":"https://doi.org/10.1109/TIP.2020.3007844","url":null,"abstract":"Single image defogging is a classical and challenging problem in computer vision. Existing methods towards this problem mainly include handcrafted priors based methods that rely on the use of the atmospheric degradation model and learning-based approaches that require paired fog-fogfree training example images. In practice, however, prior-based methods are prone to failure due to their own limitations and paired training data are extremely difficult to acquire. Moreover, there are few studies on the unpaired trainable defogging network in this field. Thus, inspired by the principle of CycleGAN network, we have developed an end-to-end learning system that uses unpaired fog and fogfree training images, adversarial discriminators and cycle consistency losses to automatically construct a fog removal system. Similar to CycleGAN, our system has two transformation paths; one maps fog images to a fogfree image domain and the other maps fogfree images to a fog image domain. Instead of one stage mapping, our system uses a two stage mapping strategy in each transformation path to enhance the effectiveness of fog removal. Furthermore, we make explicit use of prior knowledge in the networks by embedding the atmospheric degradation principle and a sky prior for mapping fogfree images to the fog images domain. In addition, we also contribute the first real world nature fog-fogfree image dataset for defogging research. Our multiple real fog images dataset (MRFID) contains images of 200 natural outdoor scenes. For each scene, there is one clear image and corresponding four foggy images of different fog densities manually selected from a sequence of images taken by a fixed camera over the course of one year. Qualitative and quantitative comparison against several state-of-the-art methods on both synthetic and real world images demonstrate that our approach is effective and performs favorably for recovering a clear image from a foggy image.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":"7819-7833"},"PeriodicalIF":10.6,"publicationDate":"2019-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIP.2020.3007844","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62591533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}