MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500196
Yan Li, Feng Yu, Ying-Qing Xu, Eric Chang, H. Shum
{"title":"Speech-driven cartoon animation with emotions","authors":"Yan Li, Feng Yu, Ying-Qing Xu, Eric Chang, H. Shum","doi":"10.1145/500141.500196","DOIUrl":"https://doi.org/10.1145/500141.500196","url":null,"abstract":"In this paper, we present a cartoon face animation system for multimedia HCI applications. We animate face cartoons not only from input speech, but also based on emotions derived from speech signal. Using a corpus of over 700 utterances from different speakers, we have trained SVMs (support vector machines) to recognize four categories of emotions: neutral, happiness, anger and sadness. Given each input speech phrase, we identify its emotion content as a mixture of all four emotions, rather than classifying it into a single emotion. Then, facial expressions are= generated from the recovered emotion for each phrase, by morphing different cartoon templates that correspond to various emotions. To ensure smooth transitions in the animation, we apply low-pass filtering to the recovered (and possibly jumpy) emotion sequence. Moreover, lip-syncing is applied to produce the lip movement from speech, by recovering a statistical audio-visual mapping. Experimental results demonstrate that cartoon animation sequences generated by our system are of good and convincing quality.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126204971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500240
Zhengping Wu, Chun Chen
{"title":"A new foreground extraction scheme for video streams","authors":"Zhengping Wu, Chun Chen","doi":"10.1145/500141.500240","DOIUrl":"https://doi.org/10.1145/500141.500240","url":null,"abstract":"The MPEG-4 video coding standard consists of object based coding schemes for multimedia and enables content based functionalities. Video objects in still pictures or video sequences should be first identified before the encoding process starts. An algorithm based on information fusion, which can be used for the extraction of foreground objects in video streams in real time is proposed in this paper. The method efficiently integrates image and motion information of video streams. The thresholding technique operated in the HSV space provides a better use of the color information than that in the traditional RGB space. Using enhanced boundary extracted from the motion region for contour adaptation offers an original method to refine the contour.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132698924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500237
Yixin Chen, James Ze Wang, Jia Li
{"title":"FIRM: fuzzily integrated region matching for content-based image retrieval","authors":"Yixin Chen, James Ze Wang, Jia Li","doi":"10.1145/500141.500237","DOIUrl":"https://doi.org/10.1145/500141.500237","url":null,"abstract":"We propose FIRM (Fuzzily Integrated Region Matching), an efficient and robust similarity measure for region-based image retrieval. Each image in our retrieval system is represented by a set of regions that are characterized by fuzzy sets. The FIRM measure, representing the overall similarity between two images, is defined as the similarity between two families of fuzzy sets. Compared with similarity measures based on individual regions and on all regions with crisp feature representations, our approach greatly reduces the influence of inaccurate segmentation. Experimental results based on a database of about 200,000 general-purpose images demonstrate improved accuracy, robustness, and high speed.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134638017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500162
G. Pingali, Agata Opalach, I. Carlbom
{"title":"Multimedia retrieval through spatio-temporal activity maps","authors":"G. Pingali, Agata Opalach, I. Carlbom","doi":"10.1145/500141.500162","DOIUrl":"https://doi.org/10.1145/500141.500162","url":null,"abstract":"As multiple video cameras and other sensors generate very large quantities of multimedia data in media productions and surveillance applications, a key challenge is to identify the relevant portions of the data and to rapidly retrieve the corresponding sensor data. Spatio-temporal activity maps serve as an efficient and intuitive graphical user interface for multimedia retrieval, particularly when the media streams are derived from multiple sensors observing a physical environment. We formulate the media retrieval problem in this context, and develop an architecture for interactive media retrieval by combining spatio-temporal \"activity maps\" with domain specific event information. Activity maps are computed from trajectories of motion of objects in the environment, which in turn are derived automatically by analysis of sensor data. We present an activity map based video retrieval system for the sport of tennis and demonstrate that the activity map based scheme significantly helps the user in a) discovering the relevant portions of the data, and b) non-linearly retrieving the corresponding media streams.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133806158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500177
C. Poellabauer, K. Schwan, R. West
{"title":"Coordinated CPU and event scheduling for distributed multimedia applications","authors":"C. Poellabauer, K. Schwan, R. West","doi":"10.1145/500141.500177","DOIUrl":"https://doi.org/10.1145/500141.500177","url":null,"abstract":"Distributed multimedia applications require support from the underlying operating system to achieve and maintain their desired Quality of Service (QoS). This has led to the creation of novel task and message schedulers and to the development of QoS mechanisms that allow applications to explicitly interact with relevant operating system services. However, the task scheduling techniques developed to date are not well equipped to take advantage of such interactions. As a result, important events such as position update messages in virtual environments may be ignored. If a CPU scheduler ignores these events, players will experience a lack of responsiveness or even inconsistencies in the virtual world. This paper argues that real-time and multimedia applications can benefit from coordinatedel event delivery mechanism, termed ECalls, that supports such coordination. We then show ECalls's ability to reduce variations in inter-frame times for media streams.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115596580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500198
Li-wei He, Anoop Gupta
{"title":"Exploring benefits of non-linear time compression","authors":"Li-wei He, Anoop Gupta","doi":"10.1145/500141.500198","DOIUrl":"https://doi.org/10.1145/500141.500198","url":null,"abstract":"In comparison to text, audio-video content is much more challenging to browse. Time-compression has been suggested as a key technology that can support browsing-time compression speeds up the playback of audio-video content without causing the pitch to change. Simple forms of time-compression are starting to appear in commercial streaming-media products from Microsoft and Real Networks.In this paper we explore the potential benefits of more recent and advanced types of time compression, called non-linear time compression. The most advanced of these algorithms exploit fine-grain structure of human speech (e.g., phonemes) to differentially speedup segments of speech, so that the overall speedup can be higher. In this paper we explore what are the actual gains achieved by end-users from these advanced algorithms. Our results indicate that the gains are actually quite small in common cases and come with significant system complexity and some audio/video synchronization issues.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"264 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114509300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500214
Steele Arbeeny, D. Silver
{"title":"Spatial navigation of media streams","authors":"Steele Arbeeny, D. Silver","doi":"10.1145/500141.500214","DOIUrl":"https://doi.org/10.1145/500141.500214","url":null,"abstract":"Interactive multimedia walkthrough applications are useful tools for visualizing complex areas. These environments permit navigation through a virtual space based on intuitive actions like \"go forward\" or \"go left\". The space is generally constructed using computer graphics models and enhanced with video, still images, and sound. While video is usually incorporated into these models, it is played as taken, and generally as a dependent media, i.e. the navigation controls do not control the video even if the video is a \"real\" walkthrough of the virtual space. Integrating the real-world media and the computer graphics model by registering both within a common virtual reality framework would allow navigation in the computer graphics model to control video of the corresponding location in the real world and vice versa. This would cause the computer graphics model to be regenerated based on the camera position in the real world video. Additionally, by registering into the common framework, new media data of any type can be added into the presentation. This paper presents a multimedia data structure capable of supporting these operations. A prototypical mobile video capture device is discussed that can record the required video media and process it for inclusion in the framework described.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114192778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500216
Vincent Oria, M. Tamer Özsu, Shu Lin, P. Iglinski
{"title":"Similarity queries in the DISIMA image DBMS","authors":"Vincent Oria, M. Tamer Özsu, Shu Lin, P. Iglinski","doi":"10.1145/500141.500216","DOIUrl":"https://doi.org/10.1145/500141.500216","url":null,"abstract":"In the DISIMA system, an image is composed of salient objects that are regions of interest in the image. A salient object has some syntactic properties (shape, color, textures) on which some similarity searches are defined. In addition, a global multi-precision image similarity based on multi-scale color histograms allows similarity queries on images and sub-images.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116973829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500189
Qiang Cheng, Thomas S. Huang
{"title":"An image watermarking technique using pyramid transform","authors":"Qiang Cheng, Thomas S. Huang","doi":"10.1145/500141.500189","DOIUrl":"https://doi.org/10.1145/500141.500189","url":null,"abstract":"An image watermarking technique based on pyramid transforms is proposed. An arbitrary binary pattern is formed into an effective hypothesized pattern and transmitted as a watermark. Multiresolution pyramid transforms are applied to host images, whose characteristics are exploited to embed the watermark. The detector is designed to be effective to a wide range of original signal sources and noise sources. The scheme is designed to achieve efficient trade-offs between perceptual invisibility, robustness and trustworthy detection. The experiments demonstrate that the proposed technique has high imperceptibility, good robustness, and accurate detection. It can be applied to copyright notification, enforcement, and fingerprinting.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123246476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MULTIMEDIA '01Pub Date : 2001-10-01DOI: 10.1145/500141.500192
Don Kimber, J. Foote, Surapong Lertsithichai
{"title":"FlyAbout: spatially indexed panoramic video","authors":"Don Kimber, J. Foote, Surapong Lertsithichai","doi":"10.1145/500141.500192","DOIUrl":"https://doi.org/10.1145/500141.500192","url":null,"abstract":"We describe a system called FlyAbout which uses spatially indexed panoramic video for virtual reality applications. Panoramic video is captured by moving a 360@deg camera along continuous paths. Users can interactively replay the video with the ability to view any interesting object or choose a particular direction. Spatially indexed video gives the ability to travel along paths or roads with a map-like interface. At junctions, or intersection points, users can chose which path to follow as well as which direction to look, allowing interaction not available with conventional video. Combining the spatial index with a spatial databsdde of maps or objects allows users to navigate to specific locations or interactively inspect particular objects.","PeriodicalId":416848,"journal":{"name":"MULTIMEDIA '01","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114967735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}