{"title":"Enhancing stereophonic teleconferencing with microphone arrays through sound field warping","authors":"Weig-Ge Chen, Zhengyou Zhang","doi":"10.1109/MMSP.2010.5661989","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661989","url":null,"abstract":"It has been proven that spatial audio enhances the realism of sound for teleconferencing. Previously, solutions have been proposed for multiparty conferencing where each remote participant is assumed to have his/her own microphone, and for conferencing between two rooms where the microphones in one room are connected to the equal number of loudspeakers in the other room. Either approach has its limitations. Hence, we propose a new scheme to improve stereophonic conferencing experience through an innovative use of microphone arrays. Instead of operating in the default mode where a single channel is produced using spatial filtering, we propose to transmit all channels forming a collection of spatial samples of the sound field. Those samples are warped appropriately at the remote site, and are spatialized together with audio streams from other remote sites if any, to produce the perception of a virtual sound field. Real-world audio samples are provided to showcase the proposed technique. The informal listening test shows that majority of the users prefer the new experience.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116206366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust head pose estimation by fusing time-of-flight depth and color","authors":"Amit Bleiweiss, M. Werman","doi":"10.1109/MMSP.2010.5662004","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662004","url":null,"abstract":"We present a new solution for real-time head pose estimation. The key to our method is a model-based approach based on the fusion of color and time-of-flight depth data. Our method has several advantages over existing head-pose estimation solutions. It requires no initial setup or knowledge of a pre-built model or training data. The use of additional depth data leads to a robust solution, while maintaining real-time performance. The method outperforms the state-of-the art in several experiments using extreme situations such as sudden changes in lighting, large rotations, and fast motion.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133983252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingxiong Yang, K. Tan, Bruce Culbertson, J. Apostolopoulos
{"title":"Fusion of active and passive sensors for fast 3D capture","authors":"Qingxiong Yang, K. Tan, Bruce Culbertson, J. Apostolopoulos","doi":"10.1109/MMSP.2010.5661996","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661996","url":null,"abstract":"We envision a conference room of the future where depth sensing systems are able to capture the 3D position and pose of users, and enable users to interact with digital media and contents being shown on immersive displays. The key technical barrier is that current depth sensing systems are noisy, inaccurate, and unreliable. It is well understood that passive stereo fails in non-textured, featureless portions of a scene. Active sensors on the other hand are more accurate in these regions and tend to be noisy in highly textured regions. We propose a way to synergistically combine the two to create a state-of-the-art depth sensing system which runs in near real time. In contrast the only known previous method for fusion is slow and fails to take advantage of the complementary nature of the two types of sensors.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114601450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recovering the output of an OFB in the case of instantaneous erasures in sub-band domain","authors":"Mohsen Akbari, F. Labeau","doi":"10.1109/MMSP.2010.5662032","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662032","url":null,"abstract":"In this paper, we propose a method for reconstructing the output of an Oversampled Filter Bank (OFB) when instantaneous erasures happen in the sub-band domain. Instantaneous erasure is defined as a situation where the erasure pattern changes in each time instance. This definition differs from the type of erasure usually defined in literature, where e erasures means that e channels of the OFB are off and do not work at all. This new definition is more realistic and increases the flexibility and resilience of the OFB in combating the erasures. Additionally, similar to puncturing, the same idea can be used in an erasure-free channel to reconstruct the output, when sub-band samples are discarded intentionally in order to change the code rate. In this paper we also derive the sufficient conditions that should be met by the OFB in order for the proposed reconstruction method to work. Based on that, eventually we suggest a general form for the OFBs which are robust to this type of erasure.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117099383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Hole-Filling(HHF): Depth image based rendering without depth map filtering for 3D-TV","authors":"Mashhour Solh, G. Al-Regib","doi":"10.1109/MMSP.2010.5661999","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661999","url":null,"abstract":"In this paper we propose a new approach for disocclusion removal in depth image-based rendering (DIBR) for 3D-TV. The new approach, Hierarchical Hole-Filling (HHF), eliminates the need for any preprocessing of the depth map. HHF uses a pyramid like approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image. The lower resolution estimates involves a pseudo zero canceling plus Gaussian filtering of the wrapped image. Then starting backwards from the lowest resolution hole-free estimate in the pyramid, we interpolate and use the pixel values to fill in the hole in the higher up resolution image. The procedure is repeated until the estimated image is hole-free. Experimental results show that HHF yields virtual images that are free of any geometric distortions, which is not the case in other algorithms that preprocess the depth map. Experiments has also shown that unlike previous DIBR techniques, HHF is not sensitive to depth maps with high percentage of bad matching pixels.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123242636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Keimel, Julian Habigt, Tim Habigt, Martin Rothbucher, K. Diepold
{"title":"Visual quality of current coding technologies at high definition IPTV bitrates","authors":"Christian Keimel, Julian Habigt, Tim Habigt, Martin Rothbucher, K. Diepold","doi":"10.1109/MMSP.2010.5662052","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662052","url":null,"abstract":"High definition video over IP based networks (IPTV) has become a mainstay in today's consumer environment. In most applications, encoders conforming to the H.264/AVC standard are used. But even within one standard, often a wide range of coding tools are available that can deliver a vastly different visual quality. Therefore we evaluate in this contribution different coding technologies, using different encoder settings of H.264/AVC, but also a completely different encoder like Dirac. We cover a wide range of different bitrates from ADSL to VDSL and different content, with low and high demand on the encoders. As PSNR is not well suited to describe the perceived visual quality, we conducted extensive subject tests to determine the visual quality. Our results show that for currently common bitrates, the visual quality can be more than doubled, if the same coding technology, but different coding tools are used.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"97 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122572232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reference frame modification methods in scalable video coding (SVC)","authors":"A. Naghdinezhad, F. Labeau","doi":"10.1109/MMSP.2010.5662019","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662019","url":null,"abstract":"With the rapid development of multimedia technology, video transmission over error prone channels is widely used. Using predictive video coding can lead to temporal and spatial propagation of channel errors, which consequently results in high degradation in the quality of the received video. In order to address this problem different error resilient methods have been proposed. In this paper, a number of the error resilient methods based on reference frame modification are overviewed briefly and examined with scalable extension of H.264/AVC (SVC). We propose a new method based on hierarchical structure used in temporal scalable coding. Average gains of 0.76 dB over the improved generalized source channel prediction (IGSCP) method and 2.26 dB over normal coding are achieved.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125568359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Rothbucher, Tim Habigt, Johannes Feldmaier, K. Diepold
{"title":"Integrating a HRTF-based sound synthesis system into Mumble","authors":"Martin Rothbucher, Tim Habigt, Johannes Feldmaier, K. Diepold","doi":"10.1109/MMSP.2010.5661988","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661988","url":null,"abstract":"This paper describes an integration of a Head Related Transfer Function (HRTF)-based 3D sound convolution engine into the open-source VoIP conferencing software Mumble. Our system allows to virtually place audio contributions of conference participants to different positions around a listener, which helps to overcome the problem of identifying active speakers in an audio conference. Furthermore, using HRTFs to generate 3D sound in virtual 3D space, the listener is able to make use of the cocktail party effect in order to differentiate between several simultaneously active speakers. As a result intelligibility of communication is increased.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130238545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Voloshynovskiy, O. Koval, F. Beekhof, F. Farhadzadeh, T. Holotyak
{"title":"Private content identification: Performance-privacy-complexity trade-off","authors":"S. Voloshynovskiy, O. Koval, F. Beekhof, F. Farhadzadeh, T. Holotyak","doi":"10.1109/MMSP.2010.5661994","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661994","url":null,"abstract":"In light of the recent development of multimedia and networking technologies, an exponentially increasing amount of content is available via various public services. That is why content identification attracts a lot of attention. One possible technology for content identification is based on digital fingerprinting. When trying to establish information-theoretic limits in this application, usually it is assumed that the codewords are of infinite length and that a jointly typical decoder is used in the analysis. These assumptions represent a certain over-generalization for the majority of practical applications. Consequently, the impact of the finite length on the mentioned limits remains an open and largely unexplored problem. Furthermore, leaking of privacy-related information to third parties due to storage, distribution and sharing of fingerprinting data represents an emerging research issue that should be addressed carefully. This paper contains an information-theoretic analysis of finite length digital fingerprinting under privacy constraints. A particular link between the considered setup and Forney's erasure/list decoding [1] is presented. Finally, complexity issues of reliable identification in large databases are addressed.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131475558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sigmoid shrinkage for BM3D denoising algorithm","authors":"M. Poderico, S. Parrilli, G. Poggi, L. Verdoliva","doi":"10.1109/MMSP.2010.5662058","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662058","url":null,"abstract":"In this work we propose a modified version of the BM3D algorithm recently introduced by Dabov et al. [1] for the denoising of images corrupted by additive white Gaussian noise. The original technique performs a multipoint filtering, where the nonlocal approach is combined with the wavelet shrinkage of a 3D cube composed by similar patches collected by means of block-matching. Our improvement concerns the thresholding of wavelet coefficients, which are subject to a different shrinkage depending on their level of sparsity. The modified algorithm is more robust with respect to block matching errors, especially when noise is high, as proved by experimental results on a large set of natural images.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115986177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}