{"title":"Bilateral depth-discontinuity filter for novel view synthesis","authors":"Ismaël Daribo, H. Saito","doi":"10.1109/MMSP.2010.5662009","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662009","url":null,"abstract":"In this paper, a new filtering technique addresses the disocclusions problem issued from the depth image based rendering (DIBR) technique within 3DTV framework. An inherent problem with DIBR is to fill in the newly exposed areas (holes) caused by the image warping process. In opposition with multiview video (MVV) systems, such as free viewpoint television (FTV), where multiple reference views are used for recovering the disocclusions, we consider in this paper a 3DTV system based on a video-plus-depth sequence which provides only one reference view of the scene. To overcome this issue, disocclusion removal can be achieved by pre-processing the depth video and/or post-processing the warped image through hole-filling techniques. Specifically, we propose in this paper a pre-processing of the depth video based on a bilateral filtering according to the strength of the depth discontinuity. Experimental results are shown to illustrate the efficiency of the proposed method compared to the traditional methods.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127209054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rate-distortion optimized low-delay 3D video communications","authors":"E. Masala","doi":"10.1109/MMSP.2010.5661998","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661998","url":null,"abstract":"This paper focuses on the rate-distortion optimization of low-delay 3D video communications based on the latest H.264/MVC video coding standard. The first part of the work proposes a new low-complexity model for distortion estimation suitable for low-delay stereoscopic video communication scenarios such as 3D videoconferencing. The distortion introduced by the loss of a given frame is investigated and a model is designed in order to accurately estimate the impact that the loss of each frame would have on future frames. The model is then employed in a rate-distortion optimized framework for video communications over a generic QoS-enabled network. Simulations results show consistent performance gains, up to 1.7 dB PSNR, with respect to a traditional a priori technique based on frame dependency information only. Moreover, the performance is shown to be consistently close to the one of the prescient technique that has perfect knowledge of the distortion characteristics of future frames.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116827026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenging the security of Content-Based Image Retrieval systems","authors":"Thanh-Toan Do, Ewa Kijak, T. Furon, L. Amsaleg","doi":"10.1109/MMSP.2010.5661993","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661993","url":null,"abstract":"Content-Based Image Retrieval (CBIR) has been recently used as a filtering mechanism against the piracy of multimedia contents. Many publications in the last few years have proposed very robust schemes where pirated contents are detected despite severe modifications. As none of these systems have addressed the piracy problem from a security perspective, it is time to check whether they are secure: Can pirates mount violent attacks against CBIR systems by carefully studying the technology they use? This paper is an initial analysis of the security flaws of the typical technology blocks used in state-of-the-art CBIR systems. It is so far too early to draw any definitive conclusion about their inherent security, but it motivates and encourages further studies on this topic.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129772428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiroshi Sankoh, A. Ishikawa, S. Naito, S. Sakazawa
{"title":"Robust background subtraction method based on 3D model projections with likelihood","authors":"Hiroshi Sankoh, A. Ishikawa, S. Naito, S. Sakazawa","doi":"10.1109/MMSP.2010.5662014","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662014","url":null,"abstract":"We propose a robust background subtraction method for multi-view images, which is essential for realizing free viewpoint video where an accurate 3D model is required. Most of the conventional methods determine background using only visual information from a single camera image, and the precise silhouette cannot be obtained. Our method employs an approach of integrating multi-view images taken by multiple cameras, in which the background region is determined using a 3D model generated by multi-view images. We apply the likelihood of background to each pixel of camera images, and derive an integrated likelihood for each voxel in a 3D model. Then, the background region is determined based on the minimization of energy functions of the voxel likelihood. Furthermore, the proposed method also applies a robust refining process, where a foreground region obtained by a projection of a 3D model is improved according to geometric information as well as visual information. A 3D model is finally reconstructed using the improved foreground silhouettes. Experimental results show the effectiveness of the proposed method compared with conventional works.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
François Fouquet, Jean-Philippe Farrugia, Brice Michoud, S. Brandel
{"title":"Fast environment extraction for lighting and occlusion of virtual objects in real scenes","authors":"François Fouquet, Jean-Philippe Farrugia, Brice Michoud, S. Brandel","doi":"10.1109/MMSP.2010.5662007","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662007","url":null,"abstract":"Augmented reality aims to insert virtual objects in real scenes. In order to obtain a coherent and realistic integration, these objects have to be relighted according to their positions and real light conditions. They also have to deal with occlusion by nearest parts of the real scene. To achieve this, we have to extract photometry and geometry from the real scene. In this paper, we adapt high dynamic range reconstruction and depth estimation methods to deal with real-time constraint and consumer devices. We present their limitations along with significant parameters influencing computing time and image quality. We tune these parameters to accelerate computation and evaluate their impact on the resulting quality. To fit with the augmented reality context, we propose a real-time extraction of these information from video streams, in a single pass.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130525906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian mixture vector quantization-based video summarization using independent component analysis","authors":"Junfeng Jiang, Xiao-Ping Zhang","doi":"10.1109/MMSP.2010.5662062","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662062","url":null,"abstract":"In this paper, we propose a new Gaussian mixture vector quantization (GMVQ)-based method to summarize the video content. In particular, in order to explore the semantic characteristics of video data, we present a new feature extraction method using independent component analysis (ICA) and color histogram difference to build a compact 3D feature space first. A new GMVQ method is then developed to find the optimized quantization codebook. The optimal codebook size is determined by Bayes information criterion (BIC). The video frames that are the nearest-neighbours to the quanta in the GMVQ quantization codebook are sampled to summarize the video content. A kD-tree-based nearest-neighbour search strategy is employed to accelerate the search procedure. Experimental results show that our method is computationally efficient and practically effective to build a content-based video summarization system.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123900958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A resilient and low-delay P2P streaming system based on network coding with random multicast trees","authors":"Marco Toldo, E. Magli","doi":"10.1109/MMSP.2010.5662054","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662054","url":null,"abstract":"Network coding is known to provide increased throughput and reduced delay for communications over networks. In this paper we propose a peer-to-peer video streaming system that exploits network coding in order to achieve low start-up delay, high streaming rate, and high resiliency to peers' dynamics. In particular, we introduce the concept of random multicast trees as overlay topology. This topology offers all benefits of tree-based overlays, notably a short start-up delay, but is much more efficient at distributing data and recovering from ungraceful peers departures. We develop a push-based streaming system that leverages network coding to efficiently distribute the information in the overlay without using buffer maps. We show performance results of the proposed system and compare it with an optimized pull systems based on Coolstreaming, showing significant improvement.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123914599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Itoh, Masahiro Erokuumae, K. Kojima, M. Ishigame, Kazuyo Tanaka
{"title":"Time-space acoustical feature for fast video copy detection","authors":"Y. Itoh, Masahiro Erokuumae, K. Kojima, M. Ishigame, Kazuyo Tanaka","doi":"10.1109/MMSP.2010.5662070","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662070","url":null,"abstract":"We propose a new time-space acoustical feature for fast video copy detection to search a video segment for a number of video streams to find illegal video copies on Internet video site and so on. We extract a small number of feature vectors from acoustically peculiar points that express the point of local maximum/minimum in the time sequence of acoustical power envelopes in video data. The relative values of the feature points are extracted, so called time-space acoustical feature, because the volume in the video stream differs in different recording environments. The features can be obtained quickly compared with representative features such as MFCC, and they require a short processing time for matching because the number and the dimension of each feature vector are both small. The accuracy and the computation time of the proposed method is evaluated using recorded TV movie programs for input data, and a 30 sec. −3 min. segment in DVD for reference data, assuming a copyright holder of a movie searches the illegal copies for video streams. We could confirm that the proposed method completed all processes within the computation time of the former feature extraction with 93.2% of F-measure in 3 minutes video segment detection.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114642681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring errors for massive triangle meshes","authors":"Anis Meftah, Arnaud Roquel, F. Payan, M. Antonini","doi":"10.1109/MMSP.2010.5662050","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662050","url":null,"abstract":"Our proposal is a method for computing the distance between two surfaces modeled by massive triangle meshes which can not be both loaded entirely in memory. The method consists in loading at each step a small part of the two meshes and computing the symmetrical distance for these areas. These areas are chosen in such a way as the orthogonal projection, used to compute this distance, have to be in it. For this, one of the two meshes is simplified and then a correspondence between the simplified mesh and the triangles of the input meshes is done. The experiments show that the proposed method is very efficient in terms of memory cost, while producing results comparable to the existent tools for the small and medium size meshes. Moreover, the proposed method enables us to compute the distance for massive meshes.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127740408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video super-resolution for dual-mode digital cameras via scene-matched learning","authors":"Guangtao Zhai, Xiaolin Wu","doi":"10.1109/MMSP.2010.5662061","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662061","url":null,"abstract":"Many consumer digital cameras support dual shooting mode of both low-resolution (LR) video and high-resolution (HR) image. By periodically switching between the video and image modes, this type of cameras make it possible to super-resolve the LR video with the assistance of neighboring HR still images. We propose a model-based video super-resolution (VSR) technique for the above dual-mode cameras. A HR video frame is modeled as a 2D piecewise autoregressive (PAR) process. The PAR model parameters are learnt from the HR still images inserted between LR video frames. By registering the LR video frames and the HR still images, we base the learning on sample statistics that matches the scene to be constructed. The resulting PAR model is more accurate and robust than if the model parameters are estimated from the LR video frames without referring to the HR images or from a training set. Aided by the powerful scene-matched model the LR video frame is upsampled to the resolution of the HR image via adaptive interpolation. As such, the proposed VSR technique does not require explicit motion estimation of subpixel precision nor the solution of a large-scale inverse problem. The new VSR technique is competitive in visual quality against existing techniques with a fraction of the computational cost.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126209511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}