{"title":"Multi-modality likelihood based particle filtering for 2-D direction of arrival tracking using a single acoustic vector sensor","authors":"X. Zhong, A. Premkumar, A. Madhukumar, C. Lau","doi":"10.1109/ICME.2011.6011965","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011965","url":null,"abstract":"The general problem addressed in this paper is tracking the 2-D direction of arrival (DOA) of an acoustic source signal by using a single acoustic vector sensor (AVS). A Bayesian framework and its particle filtering implementation are introduced to adapt to the underwater ambient noise environment, in which both the interference and background noise exist. Several innovations are explored here: 1) a particle filtering based acoustic source tracking algorithm for AVS is developed; and 2) by using a multi-modality likelihood model to model the source detection and false alarm separately, the algorithm is able to alleviate the effect due to noise and interference. Particularly, by employing additional acoustic information, the proposed approach is able to track the 2-D DOA by using a single AVS. The performance of proposed approach is fully investigated under different simulated ambient noisy environments. Experiment results show that the proposed algorithm outperforms the traditional Capon beamforming approach and is able to lock on the 2-D DOA of the source even in a very challenging environment.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125022540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image compression algorithm based on Hilbert Scanning of Embedded quadTrees: An introduction of the Hi-SET coder","authors":"Jesús Jaime Moreno Escobar, X. Otazu","doi":"10.1109/ICME.2011.6011870","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011870","url":null,"abstract":"In this work we present an effective and computationally simple algorithm for image compression based on Hilbert Scanning of Embedded quadTrees (Hi-SET). It allows to represent an image as an embedded bitstream along a fractal function. Embedding is an important feature of modern image compression algorithms, in this way Salomon in [1, pg. 614] cite that another feature and perhaps a unique one is the fact of achieving the best quality for the number of bits input by the decoder at any point during the decoding. Hi-SET possesses also this latter feature. Furthermore, the coder is based on a quadtree partition strategy, that applied to image transformation structures such as discrete cosine or wavelet transform allows to obtain an energy clustering both in frequency and space. The coding algorithm is composed of three general steps, using just a list of significant pixels. The implementation of the proposed coder is developed for gray-scale and color image compression. Hi-SET compressed images are, on average, 6.20dB better than the ones obtained by other compression techniques based on the Hilbert scanning. Moreover, Hi-SET improves the image quality in 1.39dB and 1.00dB in gray-scale and color compression, respectively, when compared with JPEG2000 coder.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125067263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Hosni, M. Bleyer, Christoph Rhemann, M. Gelautz, C. Rother
{"title":"REal-time local stereo matching using guided image filtering","authors":"A. Hosni, M. Bleyer, Christoph Rhemann, M. Gelautz, C. Rother","doi":"10.1109/ICME.2011.6012131","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012131","url":null,"abstract":"Adaptive support weight algorithms represent the state-of-the-art in local stereo matching. Their limitation is a high computational demand, which makes them unattractive for many (real-time) applications. To our knowledge, the algorithm proposed in this paper is the first local method which is both fast (real-time) and produces results comparable to global algorithms. A key insight is that the aggregation step of adaptive support weight algorithms is equivalent to smoothing the stereo cost volume with an edge-preserving filter. From this perspective, the original adaptive support weight algorithm [1] applies bilateral filtering on cost volume slices, and the reason for its poor computational behavior is that bilateral filtering is a relatively slow process. We suggest to use the recently proposed guided filter [2] to overcome this limitation. Analogously to the bilateral filter, this filter has edge-preserving properties, but can be implemented in a very fast way, which makes our stereo algorithm independent of the size of the match window. The GPU implementation of our stereo algorithm can process stereo images with a resolution of 640 × 480 pixels and a disparity range of 26 pixels at 25 fps. According to the Middlebury on-line ranking, our algorithm achieves rank 14 out of over 100 submissions and is not only the best performing local stereo matching method, but also the best performing real-time method.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126091417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of recommendation methods for TV programs","authors":"H. Kosch, Günther Hölbling","doi":"10.1109/ICME.2011.6012112","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012112","url":null,"abstract":"We present the adaptation and evaluation of classification methods for TV Program recommendation. For our evaluation, we collected over a period of 10 months the TV viewing profiles of 67 users with watched 10,845 programs. Based on the results of this evaluation, we realized a TV Recommendation System.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"20 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123542710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sotirios Karachontzitis, T. Dagiuklas, Lampros Dounis
{"title":"Novel cross-layer scheme for video transmission over LTE-based wireless systems","authors":"Sotirios Karachontzitis, T. Dagiuklas, Lampros Dounis","doi":"10.1109/ICME.2011.6012174","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012174","url":null,"abstract":"In this paper, a novel cross-layer scheme is presented for video transmission over LTE-based wireless systems. The proposed cross-layer scheme takes into account parameters from the application layer (I-based versus P-based packets), MAC Layer (Scheduling packets according to their importance) and Physical Layer (Linear Precoding). All these parameters are considered within a novel resource allocation algorithm with transmission rate constraints suitable for video applications. Simulations results have shown that the proposed cross-layer scheme performs better in terms of system throughput and perceived video quality against similar cross-layer schemes.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126721440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social focus of attention as a time function derived from multimodal signals","authors":"D. Korchagin, H. R. Abutalebi","doi":"10.1109/ICME.2011.6012241","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012241","url":null,"abstract":"In this paper, we present the results of a study on the social focus of attention as a time function derived from the multisource multimodal signals, recorded by different personal capturing devices during social events. The core of the approach is based on fission and fusion of multichannel audio, video and social modalities to derive the social focus of attention. The results achieved to date on 16+ hours of real-life data prove the feasibility of the approach.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"400 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114378585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyuan Zhang, Aixin Zhang, Jianhua Li, Shenghong Li
{"title":"An efficient angle-based shape matching approach towards object recognition","authors":"Zhiyuan Zhang, Aixin Zhang, Jianhua Li, Shenghong Li","doi":"10.1109/ICME.2011.6012233","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012233","url":null,"abstract":"The pixel-based contour map is one of the most common used shape representation methods for shape matching in object recognition field. However it is difficult to remain accurate and efficient at the same time when recognizing the objects with diversity of postures or different presence from different perspectives. To solve this problem, in this paper we propose an angle-based shape matching approach by introducing a new concept of angle-based features. Furthermore, the object recognition process adopting such angle-based shape matching approach is described in detail. With numerous experiments conducted on the Weizmann Horse dataset, we demonstrate that the proposed method is accurate, efficient and robust towards different poses and resolutions at the same time.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129788700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Oudin, Philipp Helle, J. Stegemann, Christian Bartnik, B. Bross, D. Marpe, H. Schwarz, T. Wiegand
{"title":"Block merging for quadtree-based video coding","authors":"S. Oudin, Philipp Helle, J. Stegemann, Christian Bartnik, B. Bross, D. Marpe, H. Schwarz, T. Wiegand","doi":"10.1109/ICME.2011.6012010","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012010","url":null,"abstract":"Quadtree-based block partitioning together with motion-compensated prediction has proven to be an efficient approach in video compression. However, when dealing with spatially neighboring blocks in uniformly displaced regions, quadtree-based partitioning may lead to redundant sets of transmitted motion parameters. This paper proposes and describes a simple but efficient block merging algorithm that aims at removing those redundancies by using only a single parameter set for a whole motion-compensated region of contiguous blocks. Simulation results show that our proposed merging technique works more efficiently than the conceptually similar direct mode as, e.g., specified in H.264/AVC. Due its efficiency and simplicity, our proposed merging approach has been adopted into the first test model of the high efficiency video coding (HEVC) standardization project, as currently pursued by ITU-T VCEG and ISO/IEC MPEG.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128313013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real 3D interaction behind mobile phones for augmented environments","authors":"Farid Abedan Kondori, Shahrouz Yousefi, Haibo Li","doi":"10.1109/ICME.2011.6012155","DOIUrl":"https://doi.org/10.1109/ICME.2011.6012155","url":null,"abstract":"Number of mobile devices such as mobile phones or PDAs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with device easier and more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device in the camera's field of view. This paper suggests the use of particular patterns from local orientation of the image called Rotational Symmetries to detect and localize human gesture. Relative rotation and translation of human gesture between consecutive frames are estimated by means of extracting stable features. Consequently, this information can be used to facilitate the 3D manipulation of virtual objects in various applications in mobile devices.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128340612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cixun Zhang, K. Ugur, J. Lainema, A. Hallapuro, M. Gabbouj
{"title":"Prediction Signal Aided Spatially Varying Transform","authors":"Cixun Zhang, K. Ugur, J. Lainema, A. Hallapuro, M. Gabbouj","doi":"10.1109/ICME.2011.6011911","DOIUrl":"https://doi.org/10.1109/ICME.2011.6011911","url":null,"abstract":"Spatially Varying Transform (SVT) is a technique introduced earlier to improve the coding efficiency of video coders [1][2]. SVT allows the position of the transform block within the macroblock to vary in order to better localize the underlying residual signal. The coding gains of SVT come with increased encoding complexity due to the additional need in the encoder to search for the best Location Parameter (LP) which indicates the position of the transform. In this paper, a new technique called Prediction Signal Aided Spatially Varying Transform (PSASVT) is proposed that utilizes the gradient of prediction signal to eliminate the unlikely LPs. As the number of candidate LPs is reduced, a smaller number of LPs are searched by encoder, which reduces the encoding complexity. In addition, less overhead bits are needed to code the selected LP and thus the coding efficiency can be improved. Experimental results show that the number of LPs to be tested in RDO is reduced on average by more than 20%. This reduction in encoding complexity is achieved with a slight increase in coding efficiency, as the number of candidate LPs is reduced. The decoding complexity increase is only a little.","PeriodicalId":433997,"journal":{"name":"2011 IEEE International Conference on Multimedia and Expo","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129113223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}