Jianzhou Feng, Li Song, X. Huo, Xiaokang Yang, Wenjun Zhang
{"title":"New bounds on image denoising: Viewpoint of sparse representation and non-local averaging","authors":"Jianzhou Feng, Li Song, X. Huo, Xiaokang Yang, Wenjun Zhang","doi":"10.1109/VCIP.2012.6410785","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410785","url":null,"abstract":"Image denoising plays a fundamental role in many image processing applications. Utilizing sparse representation and nonlocal averaging together is such a successful framework that leads to considerable progress in denoising. Almost all the newly proposed denoising algorithms are built base on it, different in detailed implementation, and the denoising performance seems converging. What is the denoising bound of this framework turns into a key question. In this paper, we assume all the possible algorithms under the framework can be approximated by a fixed two steps denoising process with different parameters. Step one cluster geometric similar image patches into groups so that patches within each group could be sparse represented under the basis of the group. Step two use the atoms of the group basis and radiometric similar patches of each patch for non-local averaging. The parameters of the process are the cluster number, the atoms and the number of radiometric similar patches for estimating each patch. Finally, the bound is derived as the minimum denoising error of all the possible parameters. Comparing with previous bounds, the new one is image specific and more practical. Experiment results show that there still exists room to improve the denoising performance for natural images.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127243287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple sign bits hiding for High Efficiency Video Coding","authors":"Jing Wang, Xiang Yu, Dake He, F. Henry, G. Clare","doi":"10.1109/VCIP.2012.6410753","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410753","url":null,"abstract":"High Efficiency Video Coding (HEVC) is the next-generation video coding standard currently under development, which has demonstrated substantial bit savings (rate reduction by approximately half) compared to H.264/AVC. This paper presents the multiple sign bits hiding scheme that was adopted into the committee draft of HEVC at the 8th JCT-VC meeting. In HEVC, the quantized transform coefficients are entropy-coded in groups of 16 coefficients for each transform unit. With multiple sign bits hiding, for coefficient groups that satisfy certain conditions, the sign of the first non-zero coefficient along the scanning path is not explicitly transmitted in the bitstream and instead is inferred from the parity of the sum of all non-zero coefficients in that coefficient group at the decoder. To ensure the matching between the hidden sign and the parity of the sum of all non-zero coefficients, a parity adjustment method is employed at the encoder based on rate-distortion optimization or distortion minimization. Compared with conventional video coding schemes where quantization and coefficient coding are separately designed, the multiple sign bits hiding scheme in HEVC represents a joint quantization and coefficient coding design and provides consistent rate-distortion performance gains for all standard test sequences under standard test conditions.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125513488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simplified AMVP for High Efficiency Video Coding","authors":"Liang Zhao, Xun Guo, S. Lei, Siwei Ma, Debin Zhao","doi":"10.1109/VCIP.2012.6410747","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410747","url":null,"abstract":"In High Efficiency Video Coding (HEVC), advanced motion vector prediction (AMVP) is adopted to predict current motion vector by utilizing a competition-based scheme from a given candidate set, which include both the spatial and temporal motion vectors. In order to enhance the practicability of the AMVP, a simplified AMVP is proposed. Firstly, by analyzing the importance of the spatial and temporal candidates, we reduce the number of the candidates involved in the competition set and simplify the redundancy checking process, which will decrease the complexity of the decoder as well as improve the robustness of the decoder. Secondly, we simplify the zero motion adding process which will occur only when the number of existing candidates is less than the predefined number. Experimental results show that the proposed scheme provides no loss in random access and low delay conditions. These two simplifications have been proposed and adopted into the HEVC standard.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127845142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation-based view synthesis for three-dimensional video","authors":"Maziar Loghman, Joohee Kim","doi":"10.1109/VCIP.2012.6410810","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410810","url":null,"abstract":"This paper investigates the use of segmentation in view synthesis for three-dimensional video. View synthesis is the process of generating novel views of a scene, using a set of views as the reference. Recently, several techniques that use depth maps for rendering virtual views have been suggested. However, inaccuracy in depth maps causes annoying visual artifacts in depth-based view synthesis. This paper presents an efficient depth image-based rendering technique based on segmentation using multi-level thresholding. In the proposed algorithm, first all the images are segmented according to the depth and the pixels belonging to different objects are warped and blended independently. Based on multi-level thresholding, an algorithm for finding the ghost contour pixels is provided which simplifies the computations. A novel inpainting method for disocclusions has been introduced which uses the segmented images to find the associated background boundary pixels. The experimental results show that the proposed algorithm improves the PSNR of the synthesized views up to 0.68 dB for the multi-view video test sequences and eliminates the annoying visual artifacts.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131629688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing photo quality with geo-context and crowdsourced photos","authors":"Wenyuan Yin, Tao Mei, Chang Wen Chen","doi":"10.1109/VCIP.2012.6410821","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410821","url":null,"abstract":"Automatic photo quality assessment emerged as a hot topic in recent years for its potential in numerous applications. Most existing approaches to photo quality assessment have predominantly focused on image content itself, while ignoring various contexts such as the associated geo-location and timestamp. However, such a universal aesthetic assessment model may not work well with significantly different contexts, since the photography rules are always scene and context dependent. In real cases, professional photographers use different photography knowledge when shooting various scenes in different places. Motivated by this observation, we leverage the geo-context information associated with photos for visual quality assessment. Specifically, we propose in this paper a Scene-Dependent Aesthetic Model (SDAM) to assess photo quality, by jointly leveraging the geo-context and visual content. Geo-contextual leveraged searching is performed to obtain relevant images with similar content to discover the scene-dependent photography principles for accurate photo quality assessment. To overcome the problem that in many cases the number of the contextually searched images is insufficient for learning the SDAM, we adopt transfer learning to utilize auxiliary photos within the same scene category from other locations for learning photography rules. Extensive experiments shows that the proposed SDAM scheme indeed improves the photo quality assessment accuracy via leveraging photo geo-contexts, compared with traditional universal aesthetic models.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134204549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D tubular structure extraction using kernel-based superellipsoid model with Gaussian process regression","authors":"Qingxiang Zhu, Dayu Zheng, H. Xiong","doi":"10.1109/VCIP.2012.6410763","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410763","url":null,"abstract":"To analyze the tubular structure correctly and obtain a record of the centerlines has become significantly more challenging and infers countless applications in a large amount of fields. Hence, a robust and automated technique for extracting the centerlines of the tubular structure is required. To address complicated 3D tubular objects, a novel kernel-based modeling approach with regard to minimizing tracking energy is presented in this paper. The 3D tubular structure can be demonstrated as a kernel-based superellipsoid model with non-uniform weights. To improve the performance, Gaussian process is also introduced to update the parameters of the kernel-based model, especially for the complicated structure with cross sections, varying radii, and complicated branches. At last, the extensive experimental results on 3D tubular data demonstrate that our proposed method deals effectively with complicated tubular structure.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low bit-rate video coding via mode-dependent adaptive regression for wireless visual communications","authors":"Xianming Liu, Xiaolin Wu, Xinwei Gao, Debin Zhao, Wen Gao","doi":"10.1109/VCIP.2012.6410852","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410852","url":null,"abstract":"In this paper, a practical video coding scheme is developed to realize state-of-the-art video coding efficiency with lower encoder complexity at low bit-rate, while supporting standard compliance and error resilience. Such an architecture is particularly attractive for wireless visual communications. At the encoder, multiple descriptions of a video sequence are generated in the spatio-temporal domain by temporal multiplexing and spatial adaptive downsampling. The resulting side descriptions are interleaved with each other in temporal domain, and still with conventional square sample grids in spatial domain. As such, each side description can be compressed without any change to existing video coding standards. At the decoder, each side description is first decompressed, and then reconstructed to original resolution with the help of the other side description. In this procedure, the decoder recover the original video sequence in a constrained least squares regression process, using 2D or 3D piecewise autoregressive model according to different prediction modes. In this way, the spatial and temporal correlation is sufficiently explored to achieve superior quality. Experiment results demonstrate the proposed video coding scheme outperforms H.264 in rate-distortion performance at low bit-rates and achieves superior visual quality at medium bit-rates as well.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127974358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Histogram-Based stereo matching under varying illumination conditions","authors":"Il-Lyong Jung, Jae-Young Sim, Chang-Su Kim","doi":"10.1109/VCIP.2012.6410819","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410819","url":null,"abstract":"A histogram-based matching algorithm for stereo images captured under different illumination conditions is proposed in this work. The cumulative histogram of an image represents the ranks of relative pixel brightness, which are robust to illumination changes. Therefore, we design the matching cost based on the similarity of the cumulative histograms of stereo images. As an optional mode, the proposed algorithm can evaluate the histograms for foreground objects and the background separately to alleviate occlusion artifacts. To determine the disparity of each pixel, the proposed algorithm adaptively aggregates matching costs based on the color similarity and the geometric proximity of neighboring pixels. Then, it refines false disparities at occluded pixels using more reliable disparities of non-occluded pixels. Experimental results demonstrate that the proposed algorithm provides higher quality disparity maps than the conventional methods under varying illumination conditions.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116976474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video processing techniques for 3D television","authors":"Yo-Sung Ho","doi":"10.1109/VCIP.2012.6410853","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410853","url":null,"abstract":"In recent years, various multimedia services have become available and the demand for three-dimensional television (3DTV) is growing rapidly. Since 3DTV is considered as the next generation broadcasting service that can deliver real and immersive experiences by supporting user-friendly interactions, a number of advanced three-dimensional video technologies have been studied. Among them, multi-view video coding is the key technology for various applications including free-viewpoint video, free-viewpoint television, 3DTV, immersive teleconference, and surveillance systems. In this tutorial lecture, we are going to cover the current state-of-the-art technologies for 3D video: representation of 3D scenes, acquisition of 3D video contents, illumination compensation and color correction, camera calibration and image rectification, depth map modeling and enhancement, 3-D warping and depth map refinement, coding of multi-view video and depth map, hole filling for occluded objects, and view synthesis using homography. After defining the basic requirements for realistic 3D broadcasting services, we will cover various multi-modal immersive media processing technologies.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115831991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intra prediction based on statistical modeling of images","authors":"Fatih Kamisli","doi":"10.1109/VCIP.2012.6410803","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410803","url":null,"abstract":"Intra prediction is an important part of intra-frame coding. A number of approaches have been proposed to improve intra prediction including a general linear prediction approach in which a weighted sum of all available neighbor pixels is used to predict each block pixel. An important part of this approach is the determination of the used weights. One method to determine the weights is to use the least-squares solution of an overdetermined linear system of weights. In this paper, we present an alternative approach where the weights are determined based on statistical modeling of image pixels. This approach results in an analytical expression for the weights and can achieve similar coding gains as methods based on least-squares solutions of overdetermined systems, while having several benefits such as reduced storage or computations.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114765317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}