L. Dong, Weisi Lin, Yuming Fang, Shiqian Wu, S. H. Soon
{"title":"Detection of salient objects in computer synthesized images based on object-level contrast","authors":"L. Dong, Weisi Lin, Yuming Fang, Shiqian Wu, S. H. Soon","doi":"10.1109/VCIP.2013.6706362","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706362","url":null,"abstract":"In this work, we propose a method to detect visually salient objects in computer synthesized images from 3D meshes. Different from existing detection methods on graphic saliency which compute saliency based on pixel-level contrast, the proposed method computes saliency by measuring object-level contrast of each object to the other objects in a rendered image. Given a synthesized image, the proposed method first extracts dominant colors from each object, and represents each object with the dominant color descriptor (DCD). Saliency is measured as the contrast between the DCD of the object and the DCDs of its surrounding objects. We evaluate the proposed method on a data set of computer rendered images, and the results show that the proposed method obtains much better performance compared with existing related methods.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123739467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video viewer state estimation using gaze tracking and video content analysis","authors":"Jae-Woo Kim, Jong-Ok Kim","doi":"10.1109/VCIP.2013.6706365","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706365","url":null,"abstract":"In this paper, we propose a novel viewer state model based on gaze tracking and video content analysis. There are two primary contributions in this paper. We first improve gaze state classification significantly by combining video content analysis. Then, based on the estimated gaze state, we propose a novel viewer state model indicating both viewer's interest and existence of viewer's ROIs. Experiments were conducted to verify the performance of the proposed gaze state classifier and viewer state model. The experimental results show that the use of video content analysis in gaze state classification considerably improves the classification results and consequently, the viewer state model correctly estimates the interest state of video viewers.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127154044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Apriori-like algorithm for automatic extraction of the common action characteristics","authors":"Tran Thang Thanh, Fan Chen, K. Kotani, H. Le","doi":"10.1109/VCIP.2013.6706394","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706394","url":null,"abstract":"With the development of the technology like 3D specialized markers, we could capture the moving signals from marker joints and create a huge set of 3D action MoCap data. The more we understand the human action, the better we could apply it to applications like security, analysis of sports, game etc. In order to find the semantically representative features of human actions, we extract the sets of action characteristics which appear frequently in the database. We then propose an Apriori-like algorithm to automatically extract the common sets shared by different action classes. The extracted representative action characteristics are defined in the semantic level, so that it better describes the intrinsic differences between various actions. In our experiments, we show that the knowledge extracted by this method achieves high accuracy of over 80% in recognizing actions on both training and testing data.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object co-segmentation based on directed graph clustering","authors":"Fanman Meng, Bing Luo, Chao Huang","doi":"10.1109/VCIP.2013.6706376","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706376","url":null,"abstract":"In this paper, we develop a new algorithm to segment multiple common objects from a group of images. Our method consists of two aspects: directed graph clustering and prior propagation. The clustering is used to cluster the local regions of the original images and generate the foreground priors from these clusterings. The second step propagates the prior of each class and locates the common objects from the images in terms of foreground map. Finally, we use the foreground map as the unary term of Markov random field segmentation and segment the common objects by graph-cuts algorithm. We test our method on FlickrMFC and ICoseg datasets. The experimental results show that the proposed method can achieve larger accuracy compared with several state-of-arts co-segmentation methods.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121273480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Oriented geodesic distance based non-local regularisation approach for optic flow estimation","authors":"Shan Yu, D. Molloy","doi":"10.1109/VCIP.2013.6706439","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706439","url":null,"abstract":"Optical flow (OF) estimation needs spatial coherence regularisation, due to local image noise and the well-known aperture problem. More recently, OF local-region regularisation has been extended to larger or non-local region of regularisation to further deal with the aperture problem. After careful literature review, it has been determined that the criteria used for deciding the degree of motion coherence can be further improved. For this reason, we propose an oriented geodesic distance based motion regularisation scheme. The proposed approach is particular useful in reducing errors in estimating motions along object boundaries, and recovering motions for nearby objects with similar appearance. Experiment results, compared to leading-edge non-local regularisation schemes, have confirmed the superior performance of the proposed approach.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"320 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126024259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Depth from defocus and blur for single image","authors":"Huadong Sun, Zhijie Zhao, Xuesong Jin, Lianding Niu, Lizhi Zhang","doi":"10.1109/VCIP.2013.6706352","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706352","url":null,"abstract":"Depth for single image is a hot problem in computer vision, which is very important to 2D/3D image conversion. Generally, depth of the object in the scene varies with the amount of blur in the defocus images. So, depth in the scene can be recovered by measuring the blur. In this paper, a new method for depth estimation based on focus/defocus cue is presented, where the entropy of high frequency subband of wavelet decomposition is regarded as the measure of blur. The proposed method, which is unnecessary to select threshold, can provide pixel-level depth map. The experimental results show that this method is effective and reliable.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127036753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhishek Nagar, A. Saxena, S. Bucak, Felix C. A. Fernandes, Kong-Posh Bhat
{"title":"Low complexity image matching using color based SIFT","authors":"Abhishek Nagar, A. Saxena, S. Bucak, Felix C. A. Fernandes, Kong-Posh Bhat","doi":"10.1109/VCIP.2013.6706456","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706456","url":null,"abstract":"Image matching and search is gaining significant commercial importance nowadays due to various applications it enables such as augmented reality, image-queries for internet search, etc. Many researchers have effectively used color information in an image to improve its matching accuracy. These techniques, however, cannot be directly used for large scale mobile visual search applications that pose strict constraints on the size of the extracted features, computational resources and the system accuracy. To overcome this limitation, we propose a new and effective technique to incorporate color information that can use the SIFT extraction technique. We conduct our experiments on a large dataset containing around 33, 000 images that is currently being investigated in the MPEG-Compact Descriptors for Visual Search Standard and show substantial improvement compared to baseline.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinhui Hu, R. Hu, Zhongyuan Wang, Yan Gong, Mang Duan
{"title":"Color image guided locality regularized representation for Kinect depth holes filling","authors":"Jinhui Hu, R. Hu, Zhongyuan Wang, Yan Gong, Mang Duan","doi":"10.1109/VCIP.2013.6706366","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706366","url":null,"abstract":"The emergence of Microsoft Kinect has attracted the attention not only from consumers but also from researchers in the field of computer vision. It facilitates the possibility to capture the depth map of the scene in real time and with low cost. Nonetheless, due to the limitations of structured light measurements used by Kinect, the captured depth map suffers random depth missing in the occlusion or smooth regions, which affects the accuracy of many Kinect based applications. In order to fill in the holes existing in Kinect depth map, some approaches that adopted color image guided in-painting or joint bilateral filter have been proposed to represent the missing depth pixel by available depth pixels. However, they are not able to obtain the optimal weights, thus the obtained missing depth values are not best. In this paper, we propose a color image guided locality regularized representation (CGLRR) to reconstruct the missing depth pixels by comprehensively determining the optimal weights of the available depth pixels from collocated patches in color image. Experimental results demonstrate that the proposed algorithm can better fill in the holes of depth map both in smooth and edge region than previous works.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130844467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retina model inspired image quality assessment","authors":"Guangtao Zhai, A. Kaup, Jia Wang, Xiaokang Yang","doi":"10.1109/VCIP.2013.6706367","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706367","url":null,"abstract":"We proposed in this paper a retina model based approach for image quality assessment. The retinal model is consisted of an optical modulation transfer module and an adaptive low-pass filtering module. We treat the model as a black box and design the adaptive filter using an information theoretical approach. Since the information rate of visual signals is far beyond the processing power of the human visual system, there must be an effective data reduction stage in human visual brain. Therefore, the underlying assumption for the retina model is that the retina reduces the data amount of the visual scene while retaining as much useful information as possible. For full reference image quality assessment, the original and distorted images pass through the retinal filter before some kind of distance is calculated between the images. Retina filtering can serve as a general preprocessing stage for most existing image quality metrics. We show in this paper that retina model based MSE/PSNR, though being straightforward, has already state of the art performance on several image quality databases.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131026243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geert Braeckman, Shahid M. Satti, Heng Chen, A. Munteanu, P. Schelkens
{"title":"Visually lossless screen content coding using HEVC base-layer","authors":"Geert Braeckman, Shahid M. Satti, Heng Chen, A. Munteanu, P. Schelkens","doi":"10.1109/VCIP.2013.6706364","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706364","url":null,"abstract":"This paper presents a novel two-layer coding framework targeting visually lossless compression of screen content video. The proposed framework employs the conventional HEVC standard for the base-layer. For the enhancement layer, a hybrid of spatial and temporal block-prediction mechanism is introduced to guarantee a small energy of the error-residual. Spatial prediction is generally chosen for dynamic areas, while temporal predictions yield better prediction for static areas in a video frame. The prediction residual is quantized based on whether a given block is static or dynamic. Run-length coding, Golomb based binarization and context-based arithmetic coding are employed to efficiently code the quantized residual and form the enhancement-layer. Performance evaluations using 4:4:4 screen content sequences show that, for visually lossless video quality, the proposed system significantly saves the bit-rate compared to the two-layer lossless HEVC framework.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129766126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}