{"title":"A reconfiguration system for video decoder","authors":"Tao Xi, H. Qi, Dandan Ding, Lu Yu","doi":"10.1109/VCIP.2014.7051570","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051570","url":null,"abstract":"This demonstration system shows a kind of video decoder's implementation in Reconfigurable Video Coding (RVC) framework on Open RVC-CAL Compiler (Orcc) platform. Differently from tradition video decoder, the reconfigurable video decoder is not a decoder conforming a special video coding standard, but dynamically built according to actual bitsteams, which may not conform any standard. The reconfigurable video decoder receives not only the compressed video bitstream but also the decoder description. As an example, in this demo, we reconfigure AVS and H.264/AVC decoders using Just-In-Time Adaptive Decoder Engine (Jade).","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124779985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving dense stereo matching via quadratic programming","authors":"Rui Ma, O. Au, Pengfei Wan, Wenxiu Sun, Lingfeng Xu, Luheng Jia","doi":"10.1109/VCIP.2014.7051583","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051583","url":null,"abstract":"We study the problem of formulating the discrete dense stereo matching using continuous convex optimization. One of the previous work derived a relaxed convex formulation by establishing the relationship between the disparity vector and a warping matrix. However it suffers from high computational complexity. In this paper, the previous convex formulation is translated into an equivalent quadratic programming (QP). Then redundant variables and constraints are eliminated by exploiting the internal sparse property of the warping matrix. The resulting QP can be efficiently tackled using interior point solvers. Moreover, enhanced smoothness term and effective post-processing procedures are also incorporated to further improve the disparity accuracy. Experimental results show that the proposed method is much faster and better than the previous convex formulation, and provides competitive results against existing convex approaches.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"View synthesis prediction via motion field synthesis for 3D video coding","authors":"S. Shimizu, Shiori Sugimoto, Akira Kojima","doi":"10.1109/VCIP.2014.7051530","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051530","url":null,"abstract":"View synthesis prediction is critical for efficient compression of 3D video, which consists of multiview video and depth maps. However, its performance is limited in practical situations since it is necessary to use erroneous depth information and to perform block-based compensation instead of pixel-based warping. This paper proposes a novel view synthesis prediction scheme where motion field is synthesized by utilizing coarse disparity filed derived from erroneous depth information. As part of the proposed depth-based motion field synthesis, occlusion-aware backward mapping and 3D motion field warping are performed. In order to improve prediction performance with block-based compensation, an adaptive prediction sample generation that utilizes both temporal and inter-view correlations is also proposed. Experiments show that the proposed scheme achieves average bitrate reductions of 1.38% and 1.19% for coded views and synthesized views. The maximum gain is 11.57% for dependent view.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132713977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial pyramid VLAD","authors":"Renhao Zhou, Qingsheng Yuan, Xiaoguang Gu, Dongming Zhang","doi":"10.1109/VCIP.2014.7051576","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051576","url":null,"abstract":"In recent years, VLAD has become a popular method which encoding powerful local descriptors to the compact representations. By using this approach, an image can be represented by just a few dozen bytes while preserving excellent retrieval results after the dimensionality reduction and compression. However, throwing away the spatial information is one of the biggest weaknesses of VLAD. This paper adopts the spatial pyramid pooling method to incorporate the spatial information into the VLAD vectors. Furthermore, a new normalization method is proposed to hold this advantage. By the proposed method, the performance of VLAD can be boosted through combining spatial information. The experimental results show that our approach outperforms VLAD in almost all configurations.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130203808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Person re-identification via region-of-interest based features","authors":"Jianlou Si, Honggang Zhang, Chun-Guang Li","doi":"10.1109/VCIP.2014.7051551","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051551","url":null,"abstract":"Person re-identification is still a challenging task due to large visual appearance variations caused by illumination, background, viewpoints and poses in multi-camera surveillance. To address these challenges, many methods have been proposed. In this paper, we present an efficient method, called Region-of-Interest based Features (ROIF), via combining textural and chromatic features. It consists of two main phases - region-of-interest exploration from image and features extraction from ROI. Experimental results on the database VIPeR show that our method can yield promising accuracy with a quite cheap time cost.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131619925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient framework for image interpolation using weighted surface approximation","authors":"Jingyang Wen, Y. Wan","doi":"10.1109/VCIP.2014.7051579","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051579","url":null,"abstract":"Although it has been recognized that different textual contents in an image need to be treated differently during accurate image interpolation, how to classify these contents well has been a difficult problem due to the inherent complexity in natural images. In this paper we propose an efficient image interpolation framework with a novel weighted surface approximation approach. The key is that the weighted mean squared error of the approximation can be converted to a continuously distributed probability of a pixel belonging to a local smooth region or a textural one, thus essentially making a soft pixel classification. In addition, the fitted local surface provides an estimate of the pixel value under the smooth region assumption. This estimate is then fused with the estimate from the texture region assumption using the previously obtained probability to yield the final estimate. Experimental results show that the proposed framework consistently improves over typical state-of-the-art methods in terms of interpolation accuracy while maintaining comparable computational complexity.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134202969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. D. Abreu, L. Toni, Thomas Maugey, N. Thomos, P. Frossard, F. Pereira
{"title":"Multiview video representations for quality-scalable navigation","authors":"A. D. Abreu, L. Toni, Thomas Maugey, N. Thomos, P. Frossard, F. Pereira","doi":"10.1109/VCIP.2014.7051562","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051562","url":null,"abstract":"Interactive multiview video (IMV) applications offer to users the freedom of selecting their preferred viewpoint. Usually, in these systems texture and depth maps of captured views are available at the user side, as they permit the rendering of intermediate virtual views. However, the virtual views' quality depends on the distance to the available views used as references and on their quality, which is generally constrained by the heterogeneous capabilities of the users. In this context, this work proposes an IMV scalable system, where views are optimally organized in layers, each one offering an incremental improvement in the interactive navigation quality. We propose a distortion model for the rendered virtual views and an algorithm that selects the optimal views' subset per layer. Simulation results show the efficiency of the proposed distortion model, and that the careful choice of reference cameras permits to have a graceful quality degradation for clients with limited capabilities.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126625335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tag-based social image search with hyperedges correlation","authors":"Leiquan Wang, Zhicheng Zhao, Fei Su","doi":"10.1109/VCIP.2014.7051573","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051573","url":null,"abstract":"In social image search, most existing hypergraph methods use the visual and textual features in isolation by treating each feature term as a hyperedge. Nevertheless, they neglect the correlations of visual and textual hyperedges, which are more robust to represent the high-order relationship among vertices. In this paper, we propose a hypergraph with correlated hyperedges (CHH), which introduces high-order relationship of hyperedges into hypergraph learning. Based on CHH, a pairwise visual-textual correlation hypergraph (VTCH) model is used for tag-based social image search. To overcome the large number of newly generated hybrid hyperedges, a bagging-based method is adopted to balance the accuracy and speed. Finally, adaptive hyperedges learning method is used to obtain the relevance score for social image search. The experiments conducted on MIR Flickr show the effectiveness of our proposed method.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121723298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast multiple-view denoising based on image reconstruction by plane sweeping","authors":"Mari Miyata, K. Kodama, T. Hamamoto","doi":"10.1109/VCIP.2014.7051606","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051606","url":null,"abstract":"Denoising is important in image processing because degradation by noise affects not only the quality of captured images but also the performance of visual applications that use them. For example, under low light levels, it is difficult to accurately estimate scene depths using noisy stereo images. Conventional methods for denoising find similar regions on an image or among multiple images by block matching(BM) to integrate them for suppressing noise effectively. However, such exhaustive BM incurs considerable costs for real-time applications, in particular, when multi-view images(MVI) are involved. We use view-dependent plane sweeping(PS) for image reconstruction to achieve effective MVI denoising with low computational cost. We use PS for converting MVI to multi-focus images(MFI) to suppress their noise. Then, we find regions in focus on the MFI solely by comparing them with the target view image. Finally, we simply merge the regions to obtain reconstructed images in which their noise is effectively suppressed.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127401476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wael Elloumi, Kamel Guissous, A. Chetouani, S. Treuillet
{"title":"Improving a vision indoor localization system by a saliency-guided detection","authors":"Wael Elloumi, Kamel Guissous, A. Chetouani, S. Treuillet","doi":"10.1109/VCIP.2014.7051526","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051526","url":null,"abstract":"In this paper, we propose to use visual saliency to improve an indoor localization system based on image matching. A learning step permits to determinate the reference trajectory by selecting some key frames along the path. During the localization step, the current image is then compared to the obtained key frames in order to estimate the user's position. This comparison is realized by extracting primitive information through a saliency method, which aims to improve our localization system by focusing our attention on the more singular regions to match. Another advantage of the saliency-guided detection is to save computation time. The proposed framework has been developed and tested on a Smartphone. The obtained results show the interest of the use of saliency models by comparing the numbers of features and good matches in video sequence.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}