{"title":"Multilevel Quadratic Variation Minimization for 3D Face Modeling and Virtual View Synthesis","authors":"Xiaozheng Zhang, Yongsheng Gao, M. Leung","doi":"10.1109/MMMC.2005.55","DOIUrl":"https://doi.org/10.1109/MMMC.2005.55","url":null,"abstract":"One of the key remaining problems in face recognition is that of handling the variability in appearance due to changes in pose. One strategy is to synthesize virtual face views from real views. In this paper, a novel 3D face shape-modeling algorithm, Multilevel Quadratic Variation Minimization (MQVM), is proposed. Our method makes sole use of two orthogonal real views of a face, i.e., the frontal and profile views. By applying quadratic variation minimization iteratively in a coarse-to-fine hierarchy of control lattices, the MQVM algorithm can generate C²-smooth 3D face surfaces. Then realistic virtual face views can be synthesized by rotating the 3D models. The algorithm works properly on sparse constraint points and large images. It is much more efficient than single-level quadratic variation minimization. The modeling results suggest the validity of the MQVM algorithm for 3D face modeling and 2D face view synthesis under different poses.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121704118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Framework for Script Based Virtual Directing and Multimedia Authoring in Live Video Streaming","authors":"R. Xu, Jesse S. Jin, J. G. Allen","doi":"10.1109/MMMC.2005.41","DOIUrl":"https://doi.org/10.1109/MMMC.2005.41","url":null,"abstract":"We propose a novel framework that facilitates automatic editing and authoring of multimedia using static and moving cameras in a dynamic scene. The framework incorporates several video techniques such as object tracking using mean shift and object recognition using Scaled Invariant Feature Transform (SIFT). These techniques are linked together by a comprehensive yet simple-to-program script authoring mechanism based on video event detection. These combined features empower the system to play a virtual director role in live video stream editing and multimedia integration. The system requires minimum human intervention and can leverage production efficiency for both novice and professional users. The experimental results from our prototype system demonstrate that this framework is achievable using inexpensive hardware and standard video cameras. Our system provides comprehensive pre-production authoring capabilities that lend towards integration of video and heterogonous multimedia elements in realtime. We have found this framework to be useful in many applications such as live video streaming, distance education, live entertainment, sports coverage and personal video broadcasting.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131520148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COSMOS-7: A Video Content Modeling Framework for MPEG-7","authors":"Athanasios C. Gkoritsas, M. Angelides","doi":"10.1109/MMMC.2005.31","DOIUrl":"https://doi.org/10.1109/MMMC.2005.31","url":null,"abstract":"As the amount of available multimedia documents is increasing, it is becoming harder to access and manage such files according to what they represent. Since the well-known text-based search engines are not adequate for that task, content modeling is assisting by providing information not about the actual bits that constitute a media file, but about their meaning, in other words the bits about the bits. The COSMOS model successfully integrates low level and high-level semantics into a single framework, and is therefore a complete and operational scheme for describing content related information. The MPEG-7 scheme standardizes content modeling and since its introduction in 1999 has now reached a mature state. As applications to create MPEG-7 content are scarce, the COSMOS model can now not only be used to create multimedia content but also act as an application for creating MPEG-7 compliant output.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114972156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Feature Extraction for Play Detection in American Football Video","authors":"Tie-Yan Liu, Wei-Ying Ma, HongJiang Zhang","doi":"10.1109/MMMC.2005.37","DOIUrl":"https://doi.org/10.1109/MMMC.2005.37","url":null,"abstract":"The fact that a typical broadcast can last over 3 hours for a game of 60 minutes makes video summarization of American football games most desirable. In this paper, we present several feature extraction methods for play detection in American football video. Wavelet based motion analysis is used to extract the trend component from the noisy motion vectors; a hybrid field-color model detects field area with both high accuracy and fast speed; and a prior knowledge driven line detection method uses the court information to estimate miss-detections. Based on the so-extracted features, a boosting chain is used for feature selection and decision making. Tested on large-size video data, the detection performance of our work is very promising.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131981603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music Key Detection for Musical Audio","authors":"Yongwei Zhu, M. Kankanhalli, Sheng Gao","doi":"10.1109/MMMC.2005.56","DOIUrl":"https://doi.org/10.1109/MMMC.2005.56","url":null,"abstract":"The key or the scale information of a piece of music provides important clues on its high level musical content, like harmonic and melodic context, which can be useful for music classification, retrieval or further content analysis. Researchers have previously addressed the issue of finding the key for symbolically encoded music (MIDI); however, very little work has been done on key detection for acoustic music. In this paper, we present a method for estimating the root of diatonic scale and the key directly from acoustic signals (waveform) of popular and classical music. We propose a method to extract pitch profile features from the audio signal, which characterizes the tone distribution in the music. The diatonic scale root and key are estimated based on the extracted pitch profile by using a tone clustering algorithm and utilizing the tone structure of keys. Experiments on 72 music pieces have been conducted to evaluate the proposed techniques. The success rate of scale root detection for pop music pieces is above 90%.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114125480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative and Discriminative Modeling toward Semantic Context Detection in Audio Tracks","authors":"W. Chu, Wen-Huang Cheng, Ja-Ling Wu","doi":"10.1109/MMMC.2005.42","DOIUrl":"https://doi.org/10.1109/MMMC.2005.42","url":null,"abstract":"Semantic-level content analysis is a crucial issue to achieve efficient content retrieval and management. We propose a hierarchical approach that models the statistical characteristics of several audio events over a time series to accomplish semantic context detection. Two stages, including audio event and semantic context modeling/testing, are devised to bridge the semantic gap between physical audio features and semantic concepts. For action movies we focused in this work, hidden Markov models (HMMs) are used to model four representative audio events, i.e. gunshot, explosion, car-braking, and engine sounds. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine, SVM) approaches are investigated to fuse the characteristics and correlations among various audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and draw a sketch for semantic indexing and retrieval. Moreover, the differences between two fusion schemes are discussed to be the reference for future research.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130235117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Relevance Learning in Content-Based Image Retrieval Using GRA","authors":"Kui Cao","doi":"10.1109/MMMC.2005.40","DOIUrl":"https://doi.org/10.1109/MMMC.2005.40","url":null,"abstract":"In the uncertain and incomplete system study, the Grey Relational Analysis(GRA) method in grey system theory throws emphasis on the problem of \"small-sized data samples, poor information and uncertainty\" which cannot be handled by traditional statistics. As user’s query requirement may be ambiguous and subjective sometimes in content-based image retrieval, the query results are uncertain to some extent; therefore, retrieval process can be treated as a grey system, and the query vectors and the weight values of image features as the grey numbers. So, it is a good approach for us to develop a relevance feedback technique for content-based image retrieval using the GRA method in grey system theory. In this paper, we propose a novel relevance feedback technique for content-based image retrieval using the GRA method in the grey system theory. The key idea of the proposed approach is the grey relational analysis of the feature distributions of images the user has judged relevant, in order to understand what features have been taken into account (and to what extent) by the user in formulating this judgment, so that we can accentuate the influence of these features in the overall evaluation of image similarity. The proposed method, which allows the user to retrieve the image database and progressively refine system’s response to the query by indicating the degree of relevance of retrieved images, dynamically updates the query vectors and the weights for similarity measure in order to accurately represent the user’s particular information needs. Experimental results show that the proposed approach captures the user’s information needs more precisely.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128815347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retrieval of News Video Using Video Sequence Matching","authors":"Young-tae Kim, Tat-Seng Chua","doi":"10.1109/MMMC.2005.63","DOIUrl":"https://doi.org/10.1109/MMMC.2005.63","url":null,"abstract":"In this paper, we propose a new algorithm to find video clips with different temporal durations and some spatial variations. We adopt a longest common sub-sequence (LCS) matching technique for measuring the temporal similarity between video clips. Based on the measure we propose 3 techniques to improve the retrieval effectiveness. First, we use a few coefficients in the low frequency region of DCT block as the basis to represent spatial features. Second, we heuristically determine a suitable quantization step-size for visual features to better tolerate spatial variations of similar video clips and propose a paired quantizer method. Third, we incorporate the compactness and/or continuity of matched common sub-sequences in the LCS measure to better reflect temporal characteristics of video. The performance of the proposed algorithm shows an improvement of 63.5% in terms of MAP (mean average precision) as compared to an existing algorithm. The results show that our approach is effective for news video retrieval.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127946164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weibin Liu, Y. Uehara, Hao Yu, D. Masumoto, Yi Liu, Jiantao Pu, H. Zha
{"title":"Interactive Visual Retrieval System for Large Scale 3D Models Database","authors":"Weibin Liu, Y. Uehara, Hao Yu, D. Masumoto, Yi Liu, Jiantao Pu, H. Zha","doi":"10.1109/MMMC.2005.50","DOIUrl":"https://doi.org/10.1109/MMMC.2005.50","url":null,"abstract":"This paper focuses on the key algorithms and techniques for developing an interactive visual retrieval system for large scale 3D databases, and a novel 3D model retrieval and visualization engine, 3DMIRACLES, has been developed, which integrates effective algorithms and techniques for both shape-based retrieval of 3D models and real-time visualization of the retrieval results in realistic 3D interactive mode. In the retrieval system, interactive visualization for the retrieval user interface and 3D shape retrieval computation are the two most important functional modules. For interactive visualization, a novel 3D viewer has been developed, which implements hybrid rendering method to make much simplification and shortcut processing of 3D rendering computation for achieving high speed and efficient visualization of large scale database; for retrieval computation, new algorithms for 3D shape feature extraction and similarity matching have been developed and implemented.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129561836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Meta Data Extraction from Linguistic Meeting Transcripts for the Annodex File Format","authors":"Claudia Schremmer, S. Pfeiffer","doi":"10.1109/MMMC.2005.53","DOIUrl":"https://doi.org/10.1109/MMMC.2005.53","url":null,"abstract":"Semantic interpretation of the data distributed over the Internet is subject to major current research activity. The Continuous Media Web (CMWeb) extends the World Wide Web to time-continuously sampled data such as audio and video in regard to the searching, linking, and browsing functionality. The CMWeb technology is based the file format Annodex which streams the media content interspersed with markup in the Continuous Media Markup Language (CMML) format that contains information relevant to the whole media file, e.g., title, author, language as well as time-sensitive information, e.g., topics, speakers, time-sensitive hyperlinks. The CMML markup may be generated manually or automatically. This paper investigates the automatic extraction of meta data and markup information from complex linguistic annotations, which are annotated recordings collected for use in linguistic research. We are particularly interested in annotated recordings of meetings and teleconferences and see automatically generated CMML files and their corresponding Annodex streams as one way of viewing such recordings. The paper presents some experiments with generating Annodex files from hand-annotated meeting recordings.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125344696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}