{"title":"The hierarchical cluster model for image region segmentation","authors":"J. Randall, L. Guan, Xing Zhang, W. Li","doi":"10.1109/ICME.2002.1035876","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035876","url":null,"abstract":"The hierarchical cluster model (HCM), a neural network inspired by the human brain (see Sutton, J., Harvard Medical School, MIT, Neural Systems Group, Technical Report, 1995), is demonstrated for the purpose of region segmentation in digital images. Starting with an over segmented image, regions are merged based on evidence of a valid edge between the two regions. Unlike Sutton's work, in which the HCM is used to recall a set of pre-trained memory patterns, the HCM in our work demonstrates unsupervised decision making capabilities.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"93 1","pages":"693-696 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83464951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fang Qian, Mingjing Li, Lei Zhang, HongJiang Zhang, Bo Zhang
{"title":"Gaussian mixture model for relevance feedback in image retrieval","authors":"Fang Qian, Mingjing Li, Lei Zhang, HongJiang Zhang, Bo Zhang","doi":"10.1109/ICME.2002.1035760","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035760","url":null,"abstract":"Relevance feedback (RF) has become a powerful technique in content-based image retrieval. Most RF methods assume that positive images follow the single Gaussian distribution, which is not sufficient to model the actual distribution of images due to the gap between the semantic concept and low-level features. In this paper, the Gaussian mixture model (GMM) is applied to represent the distribution of positive images in relevance feedback, and a novel method is proposed to estimate the parameters of the GMM. Both positive and negative examples are used to estimate the number of Gaussian components. Furthermore, due to the lack of training samples, unlabeled data are also incorporated to estimate the covariance matrices. Experimental results show that our GMM-based RF method outperforms that based on a single Gaussian model.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"22 1","pages":"229-232 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83494073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved algorithm for removing impulse noise based on long-range correlation in an image","authors":"Yik-Hing Fung, Y. Chan","doi":"10.1109/ICME.2002.1035742","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035742","url":null,"abstract":"An algorithm for removing impulse noise is proposed. The algorithm not only uses the information of neighboring pixels in a local region, but also uses the long-range correlation within a natural image. In the proposed algorithm, a search criterion based on the weighted transformed contents of regions is used to look for a region which is highly correlated to the region of interest and then the center of that region is selected to replace the corrupted pixel. The method is found to be very effective in removing impulse noise from corrupted images, both in terms of the objective distortion measure and subjective visual assessment.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"18 1","pages":"157-160 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86157161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Region-based nonparametric optical flow segmentation with pre-clustering and post-clustering","authors":"K. Ma, Hai-Yun Wang","doi":"10.1109/ICME.2002.1035548","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035548","url":null,"abstract":"A region-based nonparametric video object segmentation over an optical-flow field is proposed to overcome the drawbacks inherited in pixel-based parametric approaches. The key novelties of this approach are: (1) motion field smoothing; (2) pre-clustering and post-clustering. By utilizing both spatial and temporal information extracted from the input video sequence, the raw optical-flow field is partitioned into homogeneous regions, with each region undergoing a common translational motion. Such an objective can be achieved through iterative spatio-temporal processing until the predetermined error-tolerance threshold is met. To facilitate fuzzy c-means clustering, pre-clustering and post-clustering are proposed. Experimental results demonstrate that they also effectively contribute a much improved performance in video object segmentation.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"201-204 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88520889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Previs: a person-specific realistic virtual speaker","authors":"J. Melenchón, Francesc Alías, Ignasi Iriondo Sanz","doi":"10.1109/ICME.2002.1035818","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035818","url":null,"abstract":"This paper describes a 2D realistic talking face. The facial appearance model is constructed with a parameterised 2D sample based model. This representation supports moderated head movements, facial gestures and emotional expressions. Two main contributions for talking heads applications are proposed. First, the image of the lips is synthesized by means of shape and texture information. Secondly, a nearly automated training process makes the talking face personalization easier, due to the use of mouth tracking. Additionally, lips are synchronized in real time with speech that is generated using a SAPI compliant text-to-speech engine.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"461-464 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88784262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study on automatic database creation and summarization of a vaulting horse class","authors":"H. Miyamori","doi":"10.1109/ICME.2002.1035611","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035611","url":null,"abstract":"This paper studies a method of automatically creating a database and effectively displaying a visual summary of a lesson, using content taken from a vaulting horse class. The study aims at establishing a content creation tool that can provide practical functions such as content-based retrieval and summarization, and content that can be shared by teachers and students over the network. The study also aims at clarifying how effectively such tools assist education in an actual classroom. First, the system records the scenes of vaulting actions in the class using a camera fixed beside the vaulting horse. Then it automatically extracts each jumping behavior, puts indices on key actions, and registers them into the database. Indexing on key actions is done by analyzing spatial relations and their temporal transitions between featured points of the student's circumscribed rectangle and those of a vaulting horse model. The system creates MPEG-7 based metadata for possible data exchange over the network. Experimental results show that the database registration of student ID according to the predefined recording of action sequence enables easy access and display of a student's actions, and of specific vaulting actions for comparison. These results also indicate that the summarized display of a key action's thumbnails promotes visual and objective understanding and confirmation of individual jumping movements and other key motions to improve performance.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"381-384 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86053715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimedia courseware development using influence diagram","authors":"T. Shih, Lun-Ping Hung","doi":"10.1109/ICME.2002.1035610","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035610","url":null,"abstract":"Web-based distance learning programs are widely available. A few distance education platform and standards were developed or proposed. Among current software systems, it is hard to realize a strategic assessment of courseware quality. Since one of the difficulties of distance education is the load that an instructor needs to spend in courseware design, it is worthy to investigate an automatic mechanism to help an instructor to produce effective courseware. Thus, a distance learning program can proceed efficiently. We develop a mechanism for the construction of courseware structure based on the influence diagram. The mechanism can be implemented as a decision support system for the instructor to analyze the relation among course units and test units. The overall value of a courseware can be systematically analyzed.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"54 1","pages":"377-380 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73564212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Template-based image retrieval","authors":"J. Hsieh, W. Grimson","doi":"10.1109/ICME.2002.1035749","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035749","url":null,"abstract":"The paper presents a TREE (templates and their relationship extraction and estimation) algorithm for indexing images from picture libraries with more semantics-sensitive meanings. In this approach, each image is represented by a set of templates and their spatial relationships as keys to capture the essence of the image. Each template is characterized by a set of dominant regions, which reflect different appearances of an object at different conditions and can be obtained by the proposed TEA (template extraction and analysis) algorithm through region matching. The STREAM (spatial template relationship extraction and measurement) algorithm is then proposed for obtaining the spatial relations between these extracted templates. Due to the nature of a template, which can represent various appearances of an object at different conditions, the proposed approach can provide better capabilities and flexibilities to capture image contents than other traditional region-based methods. Besides, through maintaining the spatial layout of images, the semantic meanings hidden in the query images can be extracted and lead to significant improvements in the accuracy of image retrieval.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"29 1","pages":"185-188 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85864304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music style mining and classification by melody","authors":"M. Shan, Fang-Fei Kuo, Mao-Fu Chen","doi":"10.1109/ICME.2002.1035727","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035727","url":null,"abstract":"Music style is one of the features that people used to classify music. Discovery of music style is helpful for the design of a content-based music retrieval system. In this paper we investigate the mining and classification of music style by melody from a collection of MIDI music. We extract the chord from the melody and investigate the representation of extracted features and corresponding mining techniques for music classification. Experimental results show that the classification accuracy is about 70% to 84% for 2-way classification.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"66 1","pages":"97-100 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86387324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Buchner, S. Spors, Walter Kellermann, R. Rabenstein
{"title":"Full-duplex communication systems using loudspeaker arrays and microphone arrays","authors":"H. Buchner, S. Spors, Walter Kellermann, R. Rabenstein","doi":"10.1109/ICME.2002.1035830","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035830","url":null,"abstract":"For high-quality multimedia communication systems, such as teleconferencing, or tele-teaching (especially of music), multichannel sound reproduction is highly desirable. While current approaches still rely on a restrained listening area, the sweet spot, a volume solution for a large listening space is offered by the wave field synthesis (WFS) method, where arrays of loudspeakers generate a prespecified sound field. On the recording side of the two-way systems, the use of microphone arrays is an effective approach to cope with undesired signal components in the receiving room. However, before full-duplex communication can be deployed, efficient approaches to the acoustic echo cancellation (AEC) problem in this challenging scenario have to be found. We investigate different options for system integration, after a brief discussion of the current state of the art. We then present a first real-time solution on a regular PC platform, based on an efficient AEC for MIMO (multi-input and multi-output) systems in the frequency-domain.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"39 1","pages":"509-512 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81327754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}