{"title":"Visual QoS programming environment for ubiquitous multimedia services","authors":"Xiaohui Gu, D. Wichadakul, K. Nahrstedt","doi":"10.1109/ICME.2001.1237785","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237785","url":null,"abstract":"The provision of distributed multimedia services is becoming mobile and ubiquitous. Different multimedia services require application-specific Quality of Service (QoS). In this paper, we present QoSTalk, a unified component-based programming environment that allows application developers to specify different application-specific QoS requirements easily. In QoSTalk, we adopt a hierarchical approach to model application configuration graphs for different distributed multimedia services. We design and implement the XML-based Hierarchical QoS Markup Language, called HQML, to describe the hierarchical configuration graph as well as other application-specific QoS requirements and policies. QoSTalk promotes the separation of concerns in developing QoS-aware ubiquitous multimedia applications and thus enables easy programming of QoS-aware applications, running on top of a unified QoS-aware middleware framework. We have prototyped the QoSTalk in Java and CORBA. Our case studies with several multimedia applications show that QoSTalk effectively fills the gap for application developers between the very general facilities provided by the QoS-aware middleware and different kinds of distributed multimedia applications.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133601241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lessons from speechreading","authors":"P. Scanlon, R. Reilly","doi":"10.1109/ICME.2001.1237780","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237780","url":null,"abstract":"Speechreading is the ability to understand a speaker’s thoughts by watching the movements of the face and body and by using the information provided by the situation and the language. People with normal hearing and the hearing impaired use speechreading to augment communication especially in noisy environments. Just as people learn this skill, machines can be trained to understand a speakers meaning. Audio-Visual Automatic Speech Recognition (AV ASR) systems use audio and visual information to recognize what has been ‘said’. The speech sounds and movements provided need not be standard speech sounds or movements. The system will provide recognition given audio information only, visual information only or both.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"61 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114021866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Faruquie, Ashish Kapoor, Rohit J. Kate, Nitendra Rajput, L. V. Subramaniam
{"title":"Audio driven facial animation for audio-visual reality","authors":"T. Faruquie, Ashish Kapoor, Rohit J. Kate, Nitendra Rajput, L. V. Subramaniam","doi":"10.1109/ICME.2001.1237848","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237848","url":null,"abstract":"In this paper, we demonstrate a morphing based automated audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and expression. An animation sequence using optical flow between visemes is constructed, given an incoming audio stream and still pictures of a face speaking different visemes. Rules are formulated based on coarticulation and the duration of a viseme to control the continuity in terms of shape and extent of lip opening. In addition to this new viseme-expression combinations are synthesized to be able to generate animations with new facial expressions. Finally various applications of this system are discussed in the context of creating audio-visual reality.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116390316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Duration dependent input output markov models for audio-visual event detection","authors":"M. Naphade, A. Garg, Thomas S. Huang","doi":"10.1109/ICME.2001.1237704","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237704","url":null,"abstract":"Detecting semantic events from audio-visual data with Spatiotemporal support is a challenging multimedia Understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116496677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adaptive transmission scheme for audio and video synchronization based on real-time transport protocol","authors":"Chia-Chen Kuo, Ming-Syan Chen, Jeng-Chun Chen","doi":"10.1109/ICME.2001.1237742","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237742","url":null,"abstract":"Multimedia streams impose tight temporal constraints since different kinds of continuous multimedia streams have to be played synchronously. We devise in this paper an adaptive transmission scheme to ensure the continuous and synchronous playback of audio and video streams based on Real-time Transport Protocol. Realization of our adaptive scheme is composed of a series of operations in three stages, namely, (1) dynamic reordering mechanism, (2) decoding-recovery mechanism, and (3) adaptive synchronization mechanism. An empirical study is conducted to provide insights into our adaptive transmission scheme. As validated by our simulation results, the adaptive transmission mechanism is able to strike a good balance of both stable playback and the end-to-end delay reduction. Furthermore, we analyze the jitter resistance, the end-to-end delay, and the buffer size required in order to enhance the applicability of this scheme to more applications that require the transmission of multimedia data.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116741556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel packet loss recovery technique for multimedia communication","authors":"Wenqing Jiang, Antonio Ortega","doi":"10.1109/ICME.2001.1237896","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237896","url":null,"abstract":"In this paper a novel loss recovery technique is proposed for multimedia communications over lossy packet networks. The proposed technique uses a combination of recent results on multiple description coding and erasure recovery codes in channel coding. The uniqueness of the proposed technique lies in its ability to recover not only the data carried in lost packets, but also the decoding state for successive packets. Experimental results on image and speech coding show that the proposed technique has excellent coding performance compared to some of the best results published and it can also significantly reduce the error propagation in successive packets due to packet losses.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114664523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takeshi Takahashi, H. Kasai, T. Hanamura, H. Tominaga
{"title":"MPEG-2 multi-program transport stream transcoder","authors":"Takeshi Takahashi, H. Kasai, T. Hanamura, H. Tominaga","doi":"10.1109/ICME.2001.1237747","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237747","url":null,"abstract":"MPEG-2 Multi-program Transport stream (TS) achieves improvement of transmission efficiency by multiplexing several MPEG-2 streams. In this paper, we propose a transcoder which achieves rate reduction of MPEG-2 multi-program TS. For the purpose of realizing MPEG-2 multi-program TS transcoder, this transcoder requires a rate control method and re-multiplexing method: The former improves average SNR values in total of streams, and the latter achieves the evasion from failure of STD buffer. Next, from simulation experiments, we compare the conventional rate control methods to the proposed one. On the other hand, we show the state of STD buffer. Finally, we show the effectiveness for our proposed scheme.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"82 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115050116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using thesaurus to model keyblock-based image retrieval","authors":"Lei Zhu, Chun Tang, A. Rao, A. Zhang","doi":"10.1109/ICME.2001.1237683","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237683","url":null,"abstract":"Keyblock, which is a new framework we proposed for content-based image retrieval, is a generalization of the textbased information retrieval technology in the image domain. In this framework, keyblocks, which are analogous to keywords in text document retrieval, can be constructed by exploiting the method of Vector Quantization (VQ). Then an image can be represented as a list of keyblocks similar to a text document which can be considered as a list of keywords. Based on this image representation, various feature models can be constructed for supporting image retrieval. In this paper, we present a new feature representation model which use the keyblock-keyblock correlation matrix, termed keyblock-thesaurus, to facilitate the image retrieval. The feature vectors of this new model incorporate the effect of correlation between keyblocks, thus being more effective in representing image content.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131799330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generation of personalized abstract of sports video","authors":"N. Babaguchi, Yoshihiko Kawai, T. Kitahashi","doi":"10.1109/ICME.2001.1237796","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237796","url":null,"abstract":"Video abstraction is defined as creating a shorter video clip from an original video stream. In this paper, we propose a method of generating a personalized abstract of broadcasted sports video. We first detect significant events from the video stream by matching with gamestats in which highlights of the game are described. Textual information in an overlay appearing on an image frame is recognized for this matching. Then, we select highlight shots from these detected events, reflecting on personal preferences. Finally, we connect each shot augmented with related audio and text in temporal order. From experimental results, we verified that an hourlength video can be compressed into a minute-length personalized abstract.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129328416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Current status of WebCT and future of information basis for higher education","authors":"S. Kajita","doi":"10.1109/ICME.2001.1237801","DOIUrl":"https://doi.org/10.1109/ICME.2001.1237801","url":null,"abstract":"WebCT that has been used widely in higher educational institutes of North America is going to be a mission critical e-Learning platform in on-campus education, rather than mere WBT system. In this paper, we describe WebCT and its current status in North America from the viewpoint of Japanese higher education. Then, we introduce three critical trends for educational information basis that we can observe in the movement of WebCT in North America; (1) contents exchange hub, (2) the integration of WebCT with existing student information system, and (3) campus portal that provides university-wide one-stop service for all member of the institution. Finally, we give a general view for educational information basis that should be contracted in Japanese higher educational institutes in 200X. We have already had all of technologies in our hands that would be necessary for higher educational institutions in the first decade of 21st century. How we can integrate and implement them to use in our own daily education seems to be a critical issue in the expected competitions in higher education.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122245206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}