{"title":"Video segmentation using BIC and stacked scanning","authors":"King Yiu Tam, J. Lay, D. Levy","doi":"10.1109/ICME.2011.6012019","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012019","url":null,"abstract":"This paper proposes an algorithm for automatic segmentation of video clips into speaker units, with the intention of using the latter as the index units for a video indexing and retrieval system. The algorithm works by pooling together global information of each speaker before detecting the true speaker change locations. The use of stacked scanning can offer better results. The experiment shows that the algorithm is able to boost the discriminating power of BIC resulting in 20% reduction in average detection offset and 5% reduction in average maximum detection offset.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129302642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binary tree decomposition depth coding for 3D video applications","authors":"Gonçalo Carmo, M. Naccari, F. Pereira","doi":"10.1109/ICME.2011.6012254","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012254","url":null,"abstract":"Recent advances in three dimensional display technologies and the growing efforts put in the production of three dimensional videos are intensely demanding for new, more efficient three dimensional content representation formats. One promising format is the so-called multiview plus depth format where multiple views of the observed scene are represented together with a per pixel depth map. Naturally, this format requires to efficiently code not only the video views but also the depth data. In this context, this paper proposes a novel depth map codec encoding the depth data by means of a binary tree triangular decomposition and reconstructing the depth map values by means of a triangle based planar approximation. For depth maps related to typical three dimensional video contents, the proposed depth map codec outperforms both the H.264/AVC standard with all intra coding modes enabled and the JPEG standard. In particular, for the same objective reconstruction quality, the proposed codec allows an average bitrate reduction of 60% and 90% regarding H.264/AVC Intra and JPEG coding, respectively.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124565516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging an image folksonomy and the Signature Quadratic Form Distance for semantic-based detection of near-duplicate video clips","authors":"Hyun-seok Min, J. Choi, W. D. Neve, Yong Man Ro","doi":"10.1109/ICME.2011.6011937","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011937","url":null,"abstract":"Being able to detect near-duplicate video clips (NDVCs) is a prerequisite for a plethora of multimedia applications. Given the observation that content transformations tend to preserve semantic information, techniques for NDVC detection may benefit from the use of a semantic approach. This paper discusses how an image folksonomy (i.e., community-contributed images and metadata) and the Signature Quadratic Form Distance (SQFD) can be leveraged for the purpose of identifying NDVCs. Experimental results obtained for the MIRFLICKR-25000 image set and the TRECVID 2009 video set indicate that an image folksonomy and SQFD can be successfully used for detecting NDVCs. In addition, our findings show that model-free NDVC detection (i.e., NDVC detection using an image folksonomy) has a higher semantic coverage than model-based NDVC detection (i.e., NDVC detection using the VIREO-374 semantic concept models).","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124585125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaochen Li, Yuehu Liu, Yuanchu Wang, Zhengwang Wu, Yang Yang
{"title":"3D facial mesh detection using geometric saliency of surface","authors":"Yaochen Li, Yuehu Liu, Yuanchu Wang, Zhengwang Wu, Yang Yang","doi":"10.1109/ICME.2011.6012122","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012122","url":null,"abstract":"This paper proposes a 3D facial mesh detection algorithm based on the geometric saliency of surface. Specifically, the geometric saliency of each vertex on 3D triangle mesh is measured by the combination of Gaussian-weighted curvature and spin-image correlation. Salient vertices with similar properties are clustered into regions on the saliency map, and represented as nodes by the graph model. To detect a 3D facial mesh, initialization and registration steps are applied to match each triangle in the graph model with a reference graph, corresponding to a 3D reference facial mesh. Furthermore, the match error between the graph model of the testing 3D mesh and the reference facial mesh is computed to classify face and non-face meshes. Experimental results demonstrate that the proposed algorithm is effective to detect 3D facial meshes and robust to facial expressions and geometric noises.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130266913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic quality control of digital image content reconstruction schemes","authors":"Pawel Korus, L. Janowski, P. Romaniak","doi":"10.1109/ICME.2011.6011872","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011872","url":null,"abstract":"In this study we address the problem of an image quality trade-off that can be observed when dealing with content reconstruction schemes based on self-embedding. We derive two models for the estimation of optimal system parameters and the optimization of the overall image quality. This goal is achieved by balancing the distortions of a different nature that affect the resulting images. The performance of the derived models is verified with an accurate reference model and compared to traditional parameter selection strategies. The models are based on basic image features only and allow for rapid prediction of the best values for system parameters in a fully automatic manner.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130342179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for adaptive interaction support based on quality of context information","authors":"M. A. Hossain","doi":"10.1109/ICME.2011.6012211","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012211","url":null,"abstract":"There is a growing interest in academia and industry for designing and developing ambient multimedia environments. Numerous sensors, devices and multimedia services are available in such environment to support users in their lives. Surprisingly, users often find it difficult in interacting with such environments due to the presence of numerous devices and services. To address this issue, context-aware implicit or automatic interaction mechanisms have been proposed to facilitate easy access of the available devices and services and reduce the cognitive load of the user. However, such implicit interactions often lead to mis-automation due to imprecision in context information. This ultimately causes distrust and dissatisfaction to the user. This paper proposes a framework that considers quality of context information to dynamically adjust the level of implicit interaction and allows a system to operate in different modes ranging from full-automation to action suggestion to simple notification. Our initial user experiments demonstrate that dynamic and alternative mode of interaction not only increases the satisfaction of the users but also helps to avoid distrust in automated actions carried out by the ambient environment under varying contexts.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"34 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123212064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of adaptive LDPC AL-FEC codes for content download services","authors":"I. D. Fez, F. Fraile, R. Belda, J. C. Guerri","doi":"10.1109/ICME.2011.6011980","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011980","url":null,"abstract":"This paper presents a performance analysis of adaptive LDPC AL-FEC codes for content download services over erasure channels. In adaptive LDPC codes, clients inform the content download server of the losses they are experiencing. Using this information, the server makes FEC parity symbols available to the client at an optimum code rate. Results show the performance of adaptive AL-FEC codes for different scenarios as compared to non-adaptive AL-FEC, to optimum LDPC AL-FEC codes and to an almost ideal rateless code. Adaptive LDPC AL-FEC codes achieves download times similar to almost ideal rateless codes with less coding complexity, at the expense of an interaction channel between server and clients1.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123468552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive depth map assisted matting in 3D video","authors":"Wenxiu Sun, O. Au, Lingfeng Xu, Zhiding Yu","doi":"10.1109/ICME.2011.6012043","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012043","url":null,"abstract":"Depth map is widely adopted and available in the 3D research area. Combining the depth map with the matting techniques is helpful to the original matte and depth image based rendering in 3D. Herein, in this paper, a novel adaptive depth map assisted matting approach with concise integration is presented and applied to achieve favorable matting results. In this approach, the Lagrange-multiplier-free closed form solution is firstly derived to reduce the computation complexity and to increase matting accuracy. Based on the work of Levin et al. on closed form matting, an improved alpha matte is then achieved by introducing an adaptive smoothness criterion which is the function of depth map variance. Finally, the matting system is capable of working in a full automatical way by generating the trimap from the depth information. Simulation results demonstrate that the proposed method is able to efficiently generate an alpha matte with an roughly user specified scribbles or an automatically generated trimap.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121338533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compression of 3D MRI images based on symmetry in prediction-error field","authors":"S. Amraee, N. Karimi, S. Samavi, S. Shirani","doi":"10.1109/ICME.2011.6011897","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011897","url":null,"abstract":"Three dimensional MRI images which are power tools for diagnosis of many diseases require large storage space. A number of lossless compression schemes exist for this purpose. In this paper we propose a new approach for the compression of these images which exploits the inherent symmetry that exists in the 3D MRI images. A block matching routine is employed to work on the symmetrical characteristics of these images. Another type of block matching is also applied to eliminate the inter-slice temporal correlations. The obtained results outperform the existing standard compression techniques.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114075427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A decision support engine for video surveillance systems","authors":"D. Ahmed, S. Shirmohammadi","doi":"10.1109/ICME.2011.6012164","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012164","url":null,"abstract":"Design and implementation of an automated or a semi-automated surveillance system is an active research area. Safety concerns of individuals have accelerated the research for identifying alarming human activities in public spaces like streets, shopping malls, airports, and others. But reliance on human operator for real-time actions can be inappropriate and expensive. On the other hand, demonstrating pragmatic scene of a surveillanced area with the help of a synthetic world can be effective that can guard operator's limitation. In this paper, we design a Decision Support Engine (DSE) coupled with a synthetic space to facilitate surveillance activities of operators. For this purpose, the detailed sensory data are processed and alarms are detected, classified and ranked according to the threat severity that follows some well-defined rules of the system. At the end, the identified alarming cases are marked and presented with the help of a synthetic environment in an elegant way so that the operators can take the right action in a specific security circumstance.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114099214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}