Rui Zhu, Xiao-Jiao Mao, Qi-Hai Zhu, Ning Li, Yubin Yang
{"title":"Text detection based on convolutional neural networks with spatial pyramid pooling","authors":"Rui Zhu, Xiao-Jiao Mao, Qi-Hai Zhu, Ning Li, Yubin Yang","doi":"10.1109/ICIP.2016.7532514","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532514","url":null,"abstract":"Text detection is a difficult task due to the significant diversity of the texts appearing in natural scene images. In this paper, we propose a novel text descriptor, SPP-net, extracted by equipping the Convolutional Neural Network (CNN) with spatial pyramid pooling. We first compute the feature maps from the original text lines without any cropping or warping, and then generate the fixed-size representations for text discrimination. Experimental results on the latest ICDAR 2011 and 2013 datasets have proven that the proposed descriptor outperforms the state-of-the-art methods by a noticeable margin on F-measure with its merit of incorporating multi-scale text information and its flexibility of describing text regions with different sizes and shapes.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"167 1","pages":"1032-1036"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80530858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-adaptive low rank regularization for image denoising","authors":"Hangfan Liu, Xinfeng Zhang, Ruiqin Xiong","doi":"10.1109/ICIP.2016.7532928","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532928","url":null,"abstract":"Prior knowledge plays an important role in image denoising tasks. This paper utilizes the data of the input image to adaptively model the prior distribution. The proposed scheme is based on the observation that, for a natural image, a matrix consisted of its vectorized non-local similar patches is of low rank. We use a non-convex smooth surrogate for the low-rank regularization, and view the optimization problem from the empirical Bayesian perspective. In such framework, a parameter-free distribution prior is derived from the grouped non-local similar image contents. Experimental results show that the proposed approach is highly competitive with several state-of-art denoising methods in PSNR and visual quality.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"25 1 1","pages":"3091-3095"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82690449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ye Han, Zhigang Liu, Dah-Jye Lee, Guinan Zhang, Miao Deng
{"title":"High-speed railway rod-insulator detection using segment clustering and deformable part models","authors":"Ye Han, Zhigang Liu, Dah-Jye Lee, Guinan Zhang, Miao Deng","doi":"10.1109/ICIP.2016.7533081","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7533081","url":null,"abstract":"Catenary system maintenance is an important task to the operation of a high-seed railway system. Currently, the inspection of damaged parts in the catenary system is performed manually, which is often slow and unreliable. This paper proposes a method to detect and locate the rod-insulators in the image taken from the high-speed railway catenary system. Sub-images containing bar-shaped devices such as cantilever, strut, rod, and pole are first extracted from the image. Rod-insulator is then recognized and detected from these bar-shaped sub-images by using deformable part models and latent SVM. Experimental results show that the proposed method is able to locate rod-insulators accurately from the catenary image for the subsequent detect inspection process. The robustness of this method ensures its performance in different imaging conditions.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"12 1","pages":"3852-3856"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82722752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A weighted total variation approach for the atlas-based reconstruction of brain MR data","authors":"Mingli Zhang, Kuldeep Kumar, Christian Desrosiers","doi":"10.1109/ICIP.2016.7533177","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7533177","url":null,"abstract":"Compressed sensing is a powerful approach to reconstruct high-quality images using a small number of samples. This paper presents a novel compressed sensing method that uses a probabilistic atlas to impose spatial constraints on the reconstruction of brain magnetic resonance imaging (MRI) data. A weighted total variation (TV) model is proposed to characterize the spatial distribution of gradients in the brain, and incorporate this information in the reconstruction process. Experiments on T1-weighted MR images from the ABIDE dataset show our proposed method to outperform the standard uniform TV model, as well as state-of-the-art approaches, for low sampling rates and high noise levels.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"18 1","pages":"4329-4333"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89001823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"H.264 intra coding with transforms based on prediction inaccuracy modeling","authors":"Xun Cai, J. Lim","doi":"10.1109/ICIP.2016.7532785","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532785","url":null,"abstract":"In intra video coding, intra frames are predicted with intra prediction and the prediction residual signal is encoded. In many transform-based video coding systems, intra prediction residuals are encoded with transforms. For example, the Discrete Cosine Transform (DCT) and the Asymmetric Discrete Sine Transform (ADST) are used for intra prediction residuals in many coding systems. In the recent work, a set of transforms based on prediction inaccuracy modeling (PIM) has been proposed. These transforms are developed based on the observation that much of the residual non-stationarity is due to the use of an inaccurate prediction parameter. These transforms are shown to be effective for non-stationarity that arises in directional intra prediction residuals. In this paper, we implement the transforms based on prediction inaccuracy modeling on the H.264 intra coding system. The proposed transform is used in hybrid with the ADST. We compare the performance of the hybrid transform with the ADST and show that a significant bit-rate reduction is obtained with the proposed transform.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"16 1","pages":"2380-2384"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87048311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing distortions in first-person videos","authors":"Chen Bai, A. Reibman","doi":"10.1109/ICIP.2016.7532797","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532797","url":null,"abstract":"First-person videos (FPVs) captured by wearable cameras often contain heavy distortions, including motion blur, rolling shutter artifacts and rotation. Existing image and video quality estimators are inefficient for this type of video. We develop a method specifically to measure the distortions present in FPVs, without using a high quality reference video. Our local visual information (LVI) algorithm measures motion blur, and we combine homography estimation with line angle histogram to measure rolling shutter artifacts and rotation. Our experiments demonstrate that captured FPVs have dramatically different distortions compared to traditional source videos. We also show that LVI is responsive to motion blur, but insensitive to rotation and shear.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"58 1","pages":"2440-2444"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86965780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Lainema, M. Hannuksela, V. Vadakital, Emre B. Aksu
{"title":"HEVC still image coding and high efficiency image file format","authors":"J. Lainema, M. Hannuksela, V. Vadakital, Emre B. Aksu","doi":"10.1109/ICIP.2016.7532321","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532321","url":null,"abstract":"The High Efficiency Video Coding (HEVC) standard includes support for a large range of image representation formats and provides an excellent image compression capability. The High Efficiency Image File Format (HEIF) offers a convenient way to encapsulate HEVC coded images, image sequences and animations together with associated metadata into a single file. This paper discusses various features and functionalities of the HEIF file format and compares the compression efficiency of HEVC still image coding to that of JPEG 2000. According to the experimental results HEVC provides about 25% bitrate reduction compared to JPEG 2000, while keeping the same objective picture quality.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"2015 1","pages":"71-75"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87946528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Mygdalis, Alexandros Iosifidis, A. Tefas, I. Pitas
{"title":"One class classification applied in facial image analysis","authors":"V. Mygdalis, Alexandros Iosifidis, A. Tefas, I. Pitas","doi":"10.1109/ICIP.2016.7532637","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532637","url":null,"abstract":"In this paper, we apply One-Class Classification methods in facial image analysis problems. We consider the cases where the available training data information originates from one class, or one of the available classes is of high importance. We propose a novel extension of the One-Class Extreme Learning Machines algorithm aiming at minimizing both the training error and the data dispersion and consider solutions that generate decision functions in the ELM space, as well as in ELM spaces of arbitrary dimensionality. We evaluate the performance in publicly available datasets. The proposed method compares favourably to other state-of-the-art choices.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"1644-1648"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89972185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenta Takahashi, Yusuke Monno, Masayuki Tanaka, M. Okutomi
{"title":"Effective color correction pipeline for a noisy image","authors":"Kenta Takahashi, Yusuke Monno, Masayuki Tanaka, M. Okutomi","doi":"10.1109/ICIP.2016.7533111","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7533111","url":null,"abstract":"Color correction is an essential image processing operation that transforms a camera-dependent RGB color space to a standard color space, e.g., the XYZ or the sRGB color space. The color correction is typically performed by multiplying the camera RGB values by a color correction matrix, which often amplifies image noise. In this paper, we propose an effective color correction pipeline for a noisy image. The proposed pipeline consists of two parts; the color correction and denoising. In the color correction part, we utilize spatially varying color correction (SVCC) that adaptively calculates the color correction matrices for each local image block considering the noise effect. Although the SVCC can effectively suppress the noise amplification, the noise is still included in the color corrected image, where the noise levels spatially vary for each local block. In the denoising part, we propose an effective denoising framework for the color corrected image with spatially varying noise levels. Experimental results demonstrate that the proposed color correction pipeline outperforms existing algorithms for various noise levels.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"92 1","pages":"4002-4006"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89966764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive block truncation coding image compression technique using optimized dot diffusion","authors":"Yun-Fu Liu, Jing-Ming Guo, Yu Cheng","doi":"10.1109/ICIP.2016.7532736","DOIUrl":"https://doi.org/10.1109/ICIP.2016.7532736","url":null,"abstract":"Block truncation coding (BTC) has been considered as a highly efficient compression technique for decades, but the blocking artifact is its main issue. The halftoning-based BTC has significantly eased this issue, yet an apparent impulse noise artifact is accompanied. In this study, an improved BTC, termed adaptive dot-diffused BTC (ADBTC), is proposed to further improve the visual quality. Also, this method provides an additional flexibility on the compression ratios determination in contrast to the former fixed and few number of configuration possibilities. As documented in the experimental results, the proposed method achieves the superior image quality regarding the five various objective IQA methods. As a result, it is a very competitive approach for the needs of both high frame rate and high-resolution image compression.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"9 1","pages":"2137-2141"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82220282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}