{"title":"Emotional Speech Synthesis using Subspace Constraints in Prosody","authors":"Shinya Mori, T. Moriyama, S. Ozawa","doi":"10.1109/ICME.2006.262725","DOIUrl":"https://doi.org/10.1109/ICME.2006.262725","url":null,"abstract":"An efficient speech synthesis method that uses subspace constraint in prosody is proposed. Conventional unit selection methods concatenate speech segments stored in database, that require enormous number of waveforms in synthesizing various emotional expressions with arbitrary texts. The proposed method employs principal component analysis to reduce the dimensionality of prosodic components, that also allows us to generate new speech that are similar to training samples. The subspace constraint assures that the prosody of the synthesized speech including F0, power, and speech length hold their correlative relation that training samples of emotional speech have. We assume that the combination of the number of syllables and the accent type determines the correlative dynamics of prosody, for each of which we individually construct the subspace. The subspace is then linearly related to emotions by multiple regression analysis that are obtained by subjective evaluation for the training samples. Experimental results demonstrated that only 4 dimensions were sufficient for representing the prosodic changes due to emotion at over 90% of the total variance. Synthesized emotion were successfully recognized by the listeners of the synthesized speech, especially for \"anger\", \"surprise\", \"disgust\", 'sorrow\", \"boredom\", \"depression\", and \"joy\"","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114564792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation and Evolution of Packet Striping for Media Streaming Over Multiple Burst-Loss Channels","authors":"Gene Cheung, P. Sharma, Sung-Ju Lee","doi":"10.1109/ICME.2006.262734","DOIUrl":"https://doi.org/10.1109/ICME.2006.262734","url":null,"abstract":"Modern mobile devices are multi-homed with WLAN and WWAN communication interfaces. In a community of nodes with such multi-homed devices-locally inter-connected via high-speed WLAN but each globally connected to larger networks via low-speed WWAN, striping high-volume traffic from remote large networks over a bundle of low speed WWAN links can overcome the bandwidth mismatch problem between WLAN and WWAN. In our previous work, we showed that a packet striping system for such multi-homed devices-a mapping of delay-sensitive packets by an intermediate gateway to multiple channels using combination of retransmissions (ARQ) and forward error corrections (FEC)-can dramatically enhance the overall performance. In this paper, we improve upon a previous algorithm in two respects. First, by introducing two-tier dynamic programming tables to memoize computed solutions, packet striping decisions translate to simple table lookup operations given stationary network statistics. Doing so drastically reduces striping operation complexity. Second, new weighting functions are introduced into the hybrid ARQ/FEC algorithm to drive the long-term striping system evolution away from pathological local minima that are far from the global optimum. Results show the new algorithm performs efficiently and gives improved performance by avoiding local minima compared to the previous algorithm","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121866665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Training-Oriented Video Shooting Navigation System Based on Real-Time Camerawork Evaluation","authors":"Masahito Kumano, K. Uehara, Y. Ariki","doi":"10.1109/ICME.2006.262772","DOIUrl":"https://doi.org/10.1109/ICME.2006.262772","url":null,"abstract":"In this paper, we propose an online training-oriented video shooting navigation system focused on camerawork based on video grammar by real-time camerawork evaluation to train users shooting nice shots for the later editing work. In this system, the processing speed must be very high so that we use a luminance projection correlation and a structure tensor method to extract the camerawork parameters in real-time. From the results of camerawork analysis, the results of each frame are classified into 7 camerawork types and the system issues 6 types of alarms and navigates users along the specified shot depending on camerawork based on video grammar in real-time while shooting the shot. Thereby, users can naturally acquire shooting style by trying to decrease alarms of improper camerawork without a consideration of the video grammar","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122009310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao-Chang Yang, Chien-Chang Lin, Hsui-Cheng Chang, Ching-Lung Su, Jiun-In Guo
{"title":"A High Throughput VLSI Architecture Design for H.264 Context-Based Adaptive Binary Arithmetic Decoding with Look Ahead Parsing","authors":"Yao-Chang Yang, Chien-Chang Lin, Hsui-Cheng Chang, Ching-Lung Su, Jiun-In Guo","doi":"10.1109/ICME.2006.262510","DOIUrl":"https://doi.org/10.1109/ICME.2006.262510","url":null,"abstract":"In this paper we present a high throughput VLSI architecture design for context-based adaptive binary arithmetic decoding (CABAD) in MPEG-4 AVC/H.264. To speed-up the inherent sequential operations in CABAD, we break down the processing bottleneck by proposing a look-ahead codeword parsing technique on the segmenting context tables with cache registers, which averagely reduces up to 53% of cycle count. Based on a 0.18 mum CMOS technology, the proposed design outperforms the existing design by both reducing 40% of hardware cost and achieving about 1.6 times data throughput at the same time","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116835148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching Faces with Textual Cues in Soccer Videos","authors":"M. Bertini, A. Bimbo, W. Nunziati","doi":"10.1109/ICME.2006.262444","DOIUrl":"https://doi.org/10.1109/ICME.2006.262444","url":null,"abstract":"In soccer videos, most significant actions are usually followed by close-up shots of players that take part in the action itself. Automatically annotating the identity of the players present in these shots would be considerably valuable for indexing and retrieval applications. Due to high variations in pose and illumination across shots however, current face recognition methods are not suitable for this task. We show how the inherent multiple media structure of soccer videos can be exploited to understand the players' identity without relying on direct face recognition. The proposed method is based on a combination of interest point detector to \"read\" textual cues that allow to label a player with its name, such as the number depicted on its jersey, or the superimposed text caption showing its name. Players not identified by this process are then assigned to one of the labeled faces by means of a face similarity measure, again based on the appearance of local salient patches. We present results obtained from soccer videos taken from various recent games between national teams","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128534441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalability in Human Shape Analysis","authors":"Thomas Fourès, P. Joly","doi":"10.1109/ICME.2006.262651","DOIUrl":"https://doi.org/10.1109/ICME.2006.262651","url":null,"abstract":"This paper proposes a new approach for the human motion analysis. The main contribution comes from the proposed representation of the human body. Most of already existing systems are based on a model. When this one is a priori known, it may not evolve automatically according to user needs, or to the detail level that is actually possible to extract, or to restrictions due to the processing time. In order to propose a more flexible system, a hierarchical representation of the human body is implemented. It aims at providing a multi-resolution description and results at different levels of accuracy. An explanation about the model construction and the method used to map it onto features extracted from an image sequence are presented. Relations between the different body limbs and some physical constraints are then integrated. The transition from a model level to the next one is also explained and results on frames coming from a video sequence give an illustration of the proposed strategy","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"1992 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128609682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Mursalin Akon, S. Naik, Ajit Singh, Xuemin Shen
{"title":"A Cross-Layered Peer-to-Peer Architecture for Wireless Mobile Networks","authors":"Mohammad Mursalin Akon, S. Naik, Ajit Singh, Xuemin Shen","doi":"10.1109/ICME.2006.262625","DOIUrl":"https://doi.org/10.1109/ICME.2006.262625","url":null,"abstract":"In this paper, we propose a novel peer-to-peer architecture for wireless mobile networks where a cross-layered gossip-like protocol is the heart of the architecture. The goal of this architecture is to reduce the bandwidth consumption and at the same time, to provide more user participation flexibility. Simulation results are given to demonstrate the performance of the proposed peer-to-peer architecture","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129277390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GPU Accelerated Inverse Photon Mapping for Real-Time Surface Reflectance Modeling","authors":"Takashi Machida, N. Yokoya, H. Takemura","doi":"10.1109/ICME.2006.262528","DOIUrl":"https://doi.org/10.1109/ICME.2006.262528","url":null,"abstract":"This paper investigates the problem of object surface reflectance modeling, which is sometimes referred to as inverse reflectometry, for photorealistic rendering and effective multimedia applications. A number of methods have been developed for estimating object surface reflectance properties in order to render real objects under arbitrary illumination conditions. However, it is still difficult to densely estimate surface reflectance properties in real-time. This paper describes a new method for real-time estimation of the non-uniform surface reflectance properties in the inverse rendering framework. Experiments are conducted in order to demonstrate the usefulness and the advantage of the proposed methods through comparative study","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video News Shot Labeling Refinement via Shot Rhythm Models","authors":"J. Kender, M. Naphade","doi":"10.1109/ICME.2006.262544","DOIUrl":"https://doi.org/10.1109/ICME.2006.262544","url":null,"abstract":"We present a three-step post-processing method for increasing the precision of video shot labels in the domain of television news. First, we demonstrate that news shot sequences can be characterized by rhythms of alternation (due to dialogue), repetition (due to persistent background settings), or both. Thus a temporal model is necessarily third-order Markov. Second, we demonstrate that the output of feature detectors derived from machine learning methods (in particular, from SVMs) can be converted into probabilities in a more effective way than two suggested existing methods. This is particularly true when detectors are errorful due to sparse training sets, as is common in this domain. Third, we demonstrate that a straightforward application of the Viterbi algorithm on a third-order FSM, constructed from observed transition probabilities and converted feature detector outputs, can refine feature label precision at little cost. We show that on a test corpus of TRECVID 2005 news videos annotated with 39 LSCOM-lite features, the mean increase in the measure of average precision (AP) was 4%, with some of the rarer and more difficult features having relative increases in AP of as much as 67%","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114623917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Source-Channel Decoding of Multiple Description Quantized and Variable Length Coded Markov Sequences","authors":"X. Wang, Xiaolin Wu","doi":"10.1109/ICME.2006.262808","DOIUrl":"https://doi.org/10.1109/ICME.2006.262808","url":null,"abstract":"This paper proposes a framework for joint source-channel decoding of Markov sequences that are encoded by an entropy coded multiple description quantizer (MDQ), and transmitted via a lossy network. This framework is particularly suited for lossy networks of inexpensive energy-deprived mobile source encoders. Our approach is one of maximum aposteriori probability (MAP) sequence estimation that exploits both the source memory and the correlation between different MDQ descriptions. The MAP problem is modeled and solved as one of the longest path in a weighted directed acyclic graph. For MDQ-compressed Markov sequences impaired by both bit errors and erasure errors, the proposed joint source-channel MAP decoder can achieve 5 dB higher SNR than the conventional hard-decision decoder. Furthermore, the new MDQ decoding technique unifies the treatments of different subsets of the K descriptions available at the decoder, circumventing the thorny issue of requiring up to 2K-1 MDQ side decoders","PeriodicalId":339258,"journal":{"name":"2006 IEEE International Conference on Multimedia and Expo","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116284559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}