{"title":"Adaptive Depth edge sharpening for 3D video depth coding","authors":"Rong Zhang, Ying Chen, M. Karczewicz","doi":"10.1109/VCIP.2012.6410849","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410849","url":null,"abstract":"In 3D video systems with Multiview Video plus Depth (MVD) representation, intermediate views can be rendered from transmitted texture views and corresponding depth maps by techniques such as Depth Image Based Rendering (DIBR). Recent standardization activities in MPEG include the development for such MVD based 3DV codecs. One codec which is H.264/AVC based is called 3DV-ATM. Because of compression, reconstructed depth maps often have certain distortions, such as blurry depth edges, which can result in noticeable artifacts in the rendered views. In this paper, a method of adaptive depth edge sharpening is proposed for 3D video coding, based on the 3DV-ATM. The proposed adaptive depth edge filtering and smoothing along depth edge techniques adaptively sharpen blurry edges of the reconstructed depth frames caused by compression. Compared to the anchor software 3DV-ATM, the proposed method achieves about 7.2% bitrate reduction on average for the rendered view PSNRs versus overall bitrates with comparable runtime complexity.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115161062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A distributed context-free grammars learning algorithm and its application in video classification","authors":"Jing Huang, D. Schonfeld","doi":"10.1109/VCIP.2012.6410829","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410829","url":null,"abstract":"In this paper, we propose a novel statistical estimation algorithm to stochastic context-sensitive grammars (SCSGs). First, we show that the SCSGs model can be solved by decomposing it into several causal stochastic context-free grammars (SCFGs) models and each of these SCFGs models can be solved simultaneously using a fully synchronous distributed computing framework. An alternate updating scheme based approximate solution to multiple SCFGs is also provided under the assumption of a realistic sequential computing framework. A series of statistical algorithms are expected to learn SCFGs subsequently. The SGSCs can be then used to represent multiple-trajectory. Experimental results demonstrate the improved performance of our method compared with existing methods for multiple-trajectory classification.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121883060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eugen Wige, Gilbert Yammine, Wolfgang Schnurrer, André Kaup
{"title":"Mode adaptive reference frame denoising for high fidelity compression in HEVC","authors":"Eugen Wige, Gilbert Yammine, Wolfgang Schnurrer, André Kaup","doi":"10.1109/VCIP.2012.6410777","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410777","url":null,"abstract":"A new video coding standard, High Efficiency Video Coding (HEVC), is currently under development. In this paper we propose two relatively low-complex adaptive Wiener schemes for efficient P-frame coding of noisy videos with HEVC. In the proposed in-loop denoising framework the reference frame is noise filtered for P-frame prediction. The introduced algorithms adapt to the HEVC coding structure and thus can efficiently model the noise within the reference frame. The simulation results show that on the one hand considerable compression gains can be achieved using the proposed in-loop denoising framework. On the other hand the developed algorithms decrease the encoder runtime, but at the same time increase the decoder runtime.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130332303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight searchable screen video recording","authors":"Mattias Marder, A. Geva, Yaoping Ruan","doi":"10.1109/VCIP.2012.6410783","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410783","url":null,"abstract":"Command logging of maintenance and operation activities of modern computer systems has become an integral component of customer and audit requirements. In recent years, this logging has usually been achieved via desktop video recording. However, the conventional approach of video recording requires high computation overhead, high network bandwidth, and a large storage size. Searching through video files is also a challenge. In this paper, we present a lossy, but text text-preserving, compression scheme that meets these challenges by creating a sparse bitonal image suitable for optical character recognition (OCR). Using our system for auditing, the bitonal image gets stored on a server. Due to the mechanism's text-preserving compression, we can apply OCR off-line to create annotations of each video frame, making the output searchable. Compared to state-of-the-art compression of raw video, our approach can reduce file size by 50-80%, while using CPU and memory resources similar to other methods.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115697432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stereoscopic video quality assessment model based on spatial-temporal structural information","authors":"Jingjing Han, Tingting Jiang, Siwei Ma","doi":"10.1109/VCIP.2012.6410736","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410736","url":null,"abstract":"Most of the existing 3D video quality assessment methods estimate the quality of each view independently and then pool them into unique objective score. Besides, they seldom take the motion information of adjacent frames into consideration. In this paper, we propose an effective stereoscopic video quality assessment method which focuses on the inter-view correlation of spatial-temporal structural information extracted from adjacent frames. The metric jointly represents and evaluates two views. By selecting salient pixels to be processed and discarding the others, the processing speed is significantly improved. Experimental results on our stereoscopic video database show that the proposed algorithm correlates well with subjective scores.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115925323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic sports video genre categorization for broadcast videos","authors":"Yuan Dong, Jiwei Zhang, Xiaofu Chang, Jian Zhao","doi":"10.1109/VCIP.2012.6410850","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410850","url":null,"abstract":"A novel sports genre categorization algorithm based on representative shot extraction and geometry visual phrase(GVP) is presented in this paper. Performance of sports classification can be observably improved by generating reduced image set containing representative information and encoding spatial information into bag-of-words (BOW) model. Firstly, Shots containing significant information of videos are chosen by key-frame clustering. Secondly, GVP are searched by the co-occurrence of visual words in a spatial layout based on scale invariant feature transform (SIFT). Then visual words and GVP are concatenated to form enhanced histograms before SVM based classifying procedure. Compared with most existing methods, our algorithm is domain knowledge free as well as fully automatic and thus provides better extensibility. Experiments on a database of 10 sport genres with over 10257 minutes of videos from different sources achieved an average accuracy of 87.3%, which validates the robustness of our proposed algorithm over large-scale database.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122135199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of video codec buffer and delay under time-varying channel","authors":"Z. Chen, Y. Reznik","doi":"10.1109/VCIP.2012.6410794","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410794","url":null,"abstract":"In this paper, we analyze the effect of time-varying channels to video codec buffer specially for low-delay applications. We derive the sufficient conditions under which an encoder can design a bitstream for any time-varying channel without decoder buffer overflow and underflow. We then apply those conditions to design a bandwidth adaptive rate control in x264 and test it under LTE simulator. Our test results show significant improvement of delay and delay jitter over traditional leaky bucket models.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122728465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demonstration of a multimedia player supporting the MPEG-DASH protocol","authors":"Herc Kwan, C. Roy, Deepak Das","doi":"10.1109/VCIP.2012.6410861","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410861","url":null,"abstract":"A multimedia player that supports the DASH protocol has been developed for devices running either Android or Apple iOS. This media player is capable of supporting different profiles in the ISO Base Media File Format and MPEG-2 Transport Stream Media Segment Format. Our demonstration includes playback of live and VOD content, both of which are streamed from a web server located hundreds of miles away from the conference venue.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124254939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rectangular partitioning for Intra prediction in HEVC","authors":"Shan Liu, Ximin Zhang, S. Lei","doi":"10.1109/VCIP.2012.6410764","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410764","url":null,"abstract":"This paper presents a mechanism of using rectangular prediction blocks for Intra prediction in the emerging High Efficiency Video Coding (HEVC) standard. In the previous and current video coding standards, Intra predictions have been processed on square blocks, e.g. 16×16, 8×8 and 4×4 pixel blocks H.264/MPEG4 AVC, and 2N×2N or N×N square prediction units (PU) in the current HEVC. In this paper, it is proposed to include 2N×N and N×2N prediction block sizes (or PU types) in a 2N×2N coding unit (CU) for the Intra prediction, as in the Inter prediction. Experimental results show that, together with other tools (e.g. NSQT) significant coding gain, i.e. average 1.8% BD-rate reduction with the use of 2-point prediction and transforms, or 1.4% BD-rate reduction without the use of 2-point prediction and transforms (for “All Intra” configuration) can be achieved, compared with HM5.0 anchor.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121296301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel particle filtering framework for 2D-TO-3D conversion from a monoscopic 2D image sequence","authors":"Jing Huang, D. Schonfeld","doi":"10.1109/VCIP.2012.6410835","DOIUrl":"https://doi.org/10.1109/VCIP.2012.6410835","url":null,"abstract":"This paper presents a novel 2D-TO-3D conversion approach from a monoscopic 2D image sequence. We propose a particle filter framework for recursive recovery of point-wise depth from feature correspondences matched through image sequences. We formulate a novel 2D dynamics model for recursive depth estimation with the combination of camera model, structure model and translation model. The proposed method utilizes edge-detection-assisted scale-invariant features to avoid lack of edge features in scale-invariant features (SIFT). Furthermore, the depths in the depth map are computed and interpolated using 2D Delaunay triangulation. Finally, a stereo-view generation algorithm is presented for multiple users that uses proposed dynamics model and particle filter framework. Experimental results show that our proposed framework yields superior results.","PeriodicalId":103073,"journal":{"name":"2012 Visual Communications and Image Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126628277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}