{"title":"Subjective Evaluations of an Experimental Gesturephone","authors":"Mohd Nazri Ramliy, N. Arif, R. Komiya","doi":"10.1109/ICME.2006.262524","DOIUrl":"https://doi.org/10.1109/ICME.2006.262524","url":null,"abstract":"This paper presents the findings of our subjective evaluations on the integration of gestures in telecommunication. The experimental setup for tracking and imitating the human arm gesture are described. Our research investigates the possibility of transferring this often overlooked communication medium in our daily communication, for its application in telecommunication using robotics. Based on the subjective evaluation, a maximum allowable delay for an imperceptible gesture reconstruction in the lateral setup is suggested","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133427470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion Segmentation of 3D Video using Modified Shape Distribution","authors":"T. Yamasaki, K. Aizawa","doi":"10.1109/ICME.2006.262929","DOIUrl":"https://doi.org/10.1109/ICME.2006.262929","url":null,"abstract":"In this paper, temporal segmentation of 3D video based on motion analysis is presented. 3D video is a sequence of 3D models made for a real-world dynamic object. A modified shape distribution algorithm is proposed to realize stable shape feature representation. In our approach, representative points are generated by clustering vertices based on their spatial distribution instead of randomly sampling vertices as in the original shape distribution algorithm. Motion segmentation is conducted analyzing local minima in degree of motion calculated in the feature vector space. The segmentation algorithm developed in this paper does not require any predefined threshold values but rely on relative relationships among local minima and local maxima of the motion. Therefore, robust segmentation has been achieved. The experiments using 3D video of traditional dances yielded encouraging results with the precision and recall rates of 93% and 88%, respectively, on average","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133522819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Multi-User Congestion Control for Video Streaming Over Wireless Networks","authors":"Xiaoqing Zhu, B. Girod","doi":"10.1109/ICME.2006.262639","DOIUrl":"https://doi.org/10.1109/ICME.2006.262639","url":null,"abstract":"When multiple video sources are live-encoded and transmitted over a common wireless network, each stream needs to adapt its encoding parameters to wireless channel fluctuations, so as to avoid congesting the network. We present a stochastic system model for analyzing multi-user congestion control for live video coding and streaming over a wireless network. Variations in video content complexities and wireless channel conditions are modeled as independent Markov processes, which jointly determine the bottleneck queue size of each stream. Interaction among multiple users are captured by a simple model of random traffic contention. Using the model, we investigate two distributed congestion control policies: an approach based on stochastic dynamic programming (SDP) and a greedy heuristic. Compared to fixed-quality coding with no congestion control, performance gains in the range of 0.5-1.3 dB in average video quality are reported for the optimized schemes from simulation results","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133597995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinqiao Wang, Ling-yu Duan, Zhenglong Li, J. Liu, Hanqing Lu, Jesse S. Jin
{"title":"A Robust Method for TV Logo Tracking in Video Streams","authors":"Jinqiao Wang, Ling-yu Duan, Zhenglong Li, J. Liu, Hanqing Lu, Jesse S. Jin","doi":"10.1109/ICME.2006.262712","DOIUrl":"https://doi.org/10.1109/ICME.2006.262712","url":null,"abstract":"Most broadcast stations rely on TV logos to claim video content ownership or visually distinguish the broadcast from the interrupting commercial block. Detecting and tracking a TV logo is of interest to TV commercial skipping applications and logo-based broadcasting surveillance (abnormal signal is accompanied by logo absence). Pixel-wise difference computing within predetermined logo regions cannot address semi-transparent TV logos well for the blending effects of a logo itself and inconstant background images. Edge-based template matching is weak for semi-transparent ones when incomplete edges appear. In this paper we present a more robust approach to detect and track TV logos in video streams on the basis of multispectral images gradient. Instead of single frame based detection, our approach makes use of the temporal correlation of multiple consecutive frames. Since it is difficult to manually delineate logos of irregular shape, an adaptive threshold is applied to the gradient image in subpixel space to extract the logo mask. TV logo tracking is finally carried out by matching the masked region with a known template. An extensive comparison experiment has shown our proposed algorithm outperforms traditional methods such as frame difference, single frame-based edge matching. Our experimental dataset comes from part of TRECVID2005 news corpus and several Chinese TV channels with challenging TV logos","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133682013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identify Sports Video Shots with \"Happy\" or \"Sad\" Emotions","authors":"Jinjun Wang, Chng Eng Siong, Changsheng Xu, Hanqing Lu, Xiaofeng Tong","doi":"10.1109/ICME.2006.262641","DOIUrl":"https://doi.org/10.1109/ICME.2006.262641","url":null,"abstract":"Semantic video content extraction and selection are critical steps in sports video analysis and editing. The identification of video segments can be from various semantic perspectives, e.g. certain event, player or emotional state. In this paper, we examined the possibility of automatically identifying shots with \"happy\" or \"sad\" emotion from broadcast sports video. Our proposed model first performs the sports highlight extraction to obtain candidate shots that possibly contain emotion information and then classifies these shots into either \"happy\" or \"sad\" emotion groups using hidden Markov model based method. The final experimental results are satisfactory","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134000704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Setlur, T. Çapin, Suresh Chitturi, Ramakrishna Vedantham, M. Ingrassia
{"title":"More: A Mobile Open Rich Media Environment","authors":"V. Setlur, T. Çapin, Suresh Chitturi, Ramakrishna Vedantham, M. Ingrassia","doi":"10.1109/ICME.2006.262612","DOIUrl":"https://doi.org/10.1109/ICME.2006.262612","url":null,"abstract":"'Rich media' is a term that implies the integration of all of the advances we have made in the mobile space delivering music, speech, text, graphics and video. This is true, but it is more than the sum of its parts. Rich media is the ability to deliver these modalities, to interact with these modalities, and to do it in a way that allows for the construction, delivery and use of compelling mobile services in an effective and economic manner. In this paper, we introduce a system called mobile open rich-media environment ('MORE') that helps realize such mobile rich media services, combining various technologies of W3C, OMA, 3GPP and IETF standards. The different components of the system include formatting, packaging, transporting, rendering and interacting with rich media files and streams","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132239938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimum Distortion Look-Up Table Based Data Hiding","authors":"Xiaofeng Wang, Xiao-Ping Zhang","doi":"10.1109/ICME.2006.262786","DOIUrl":"https://doi.org/10.1109/ICME.2006.262786","url":null,"abstract":"In this paper, we present a novel data hiding scheme based on the minimum distortion look-up table (LUT) embedding that achieves good distortion-robustness performance. We first analyze the distortion introduced by LUT embedding and formulate its relationship with run constraints of LUT. Subsequently, a Viterbi algorithm is presented to find the minimum distortion LUT. Theoretical analysis and numerical results show that the new LUT design achieves not only less distortion but also more robustness than the traditional LUT based data embedding schemes","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134461038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Image Retrieval from Distributed Images Database","authors":"T. Tillo, Marco Grangetto, G. Olmo","doi":"10.1109/ICME.2006.262899","DOIUrl":"https://doi.org/10.1109/ICME.2006.262899","url":null,"abstract":"In order to store, and retrieve images from large databases, we propose a framework, based on multiple description coding paradigms, that disseminates images over distributed servers. Consequently, decentralized download can be performed, thus reducing links overload and hotspot areas without penalizing downloads speed. Moreover, the tradeoff between system reliability and storage requirement can be achieved by tuning descriptions redundancy, thus providing high flexibility in terms of storage resources, reliability of access, and performance. The scalability of the proposed framework is achieved by the intrinsic progressivity of the multiple description schemes. Moreover, we demonstrate that system can work properly regardless of server crashes","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134510607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music Signal Synthesis using Sinusoid Models and Sliding-Window Esprit","authors":"Anders Gunnarsson, I. Gu","doi":"10.1109/ICME.2006.262798","DOIUrl":"https://doi.org/10.1109/ICME.2006.262798","url":null,"abstract":"This paper proposes a music signal synthesis scheme that is based on sinusoid modeling and sliding-window ESPRIT. Despite widely used audio coding standards, effectively synthesizing music using sinusoid models, more suitable for harmonic rich music signals, remains an open issue. In the proposed scheme, music signals are modeled by a sum of damped sinusoids in noise. A sliding window ESPRIT algorithm is applied. A continuity constraint is then imposed for tracking the time trajectories of sinusoids in music and for removing spurious spectral peaks in order to adapt to the changing number of sinusoid contents in dynamic music. Simulations have been performed to several music signals with a range of complexities, including music recorded from banjo, flute and music with mixed instruments. The results from listening and spectrograms have strongly indicated that the proposed method is very robust for music synthesis with good quality","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134526987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Hand Gesture Rendering and Decoding using a Simple Gesture Library","authors":"Jason Smith, L. Yin","doi":"10.1109/ICME.2006.262916","DOIUrl":"https://doi.org/10.1109/ICME.2006.262916","url":null,"abstract":"Recent work in hand gesture rendering and decoding has treated the two fields as separate and distinct. As the work of rendering evolves, it emphasizes exact movement replication, including more muscle and skeletal parameterization. The work in gesture decoding is largely centered on trained systems, which require large amounts of time in front of a camera rendering a gesture in order to decode movement. This paper presents a new scheme which more tightly couples the gesture rendering and decoding processes. While this scheme is simpler than existing techniques, the rendering remains natural looking, and decoding a new gesture does not require extensive training","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133475097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}