{"title":"Automatic feedback for content based image retrieval on the Web","authors":"Y. Aslandogan, Clement T. Yu","doi":"10.1109/ICME.2002.1035758","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035758","url":null,"abstract":"We address the problem of identifying images of persons in large collections, such as the Web, without an existing face image database. We describe a method and a system that automatically constructs an initial face image database for a person using textual evidence obtained from the Web, and then uses this database for identifying images of that person. The initial retrieval results are obtained via text/HTML analysis and face detection. An internal clustering process groups visually similar faces among these initial results and builds a facial database. This database is then used by a face recognizer. The outputs of the textual and visual evidence modules are combined using Dempster-Shafer (1976) evidence combination formula. We present the results of an experimental evaluation where the system was able to improve upon the detection-only method when text/HTML analysis performed poorly.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"82 1","pages":"221-224 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82064319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image retrieval based on multi-scale edge model","authors":"P. Bao, Xianjun Zhang","doi":"10.1109/ICME.2002.1035627","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035627","url":null,"abstract":"We propose a novel scheme for image retrieval using a wavelet based multi-scale edge model. All images in the database are decomposed into their multi-scale primal sketch and the background images respectively. The images are stored in the form of the extracted edge structures and background. The similarities between query image and the images in the database are measured based on the statistics of edges structures. The multi-scale edge modeling of image database can also be performed real-time to enable the image retrieval on arbitrary image databases. Experiment shows that the proposed scheme gives promising retrieval performance over the conventional retrieval methods.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"46 1","pages":"417-420 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82098336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bit-plane error recovery via cross subband for image transmission in JPEG2000","authors":"Pei-Jun Lee, Liang-Gee Chen","doi":"10.1109/ICME.2002.1035740","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035740","url":null,"abstract":"For multimedia transmission over noisy channels, the error robustness of JPEG2000 evidently outperforms that of JPEG. Since JPEG2000 is based on the discrete wavelet transform (DWT), traditional error concealment algorithms for still images in the discrete cosine transform (DCT) domain are not suitable for JPEG2000. In JPEG2000, decoding is processed bitplane by bitplane. Any data loss occurring in the bitstream will affect the consequent bitplanes and their wavelet coefficients. To solve this problem, the JPEG2000 VM7.2 program replaces the missing wavelet coefficients by zeros. However, the replacement may affect lots of significant nonzero coefficients such that some high frequency components are lost. In this paper, we present a novel error concealment algorithm for image transmission in the bitplane base. The proposed algorithm recovers the damaged bitplane data according to the cross subband and undamaged bitplane information. The recovered wavelet coefficients are similar with error-free data. The objective results show that the proposed algorithm has 3/spl sim/8dB improvement than those without the error resilient mechanism. From a subjective viewpoint, the proposed algorithm can achieve much smoother edges on the reconstructed image using our concealment algorithm.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"40 1","pages":"149-152 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82292419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-optimized spectral correlation method for background music identification","authors":"M. Abe, M. Nishiguchi","doi":"10.1109/ICME.2002.1035786","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035786","url":null,"abstract":"This paper proposes a new method of detecting a known reference signal in an input signal highly corrupted by other sounds. One major application of the method is the identification of broadcast background music corrupted by speech. In this method, the reference signal is first decomposed into a number of small time-frequency components, and the maximum similarity between each component and the input is calculated. The similarities for all the components are then integrated by a voting method. Finally, the result is used to determine whether or not the reference exists in the input; and if it exists, to determine its position. Experiments on the identification of background music and the classification of similar TV commercials have shown that this method can identify 100% of target signals with an SNR of -10dB.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"183 1","pages":"333-336 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80462545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized video streaming for networks with varying delay","authors":"S. Wee, Wai-tian Tan, J. Apostolopoulos, M. Etoh","doi":"10.1109/ICME.2002.1035431","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035431","url":null,"abstract":"This paper presents a method for distortion-optimized streaming of predictively coded video over packet networks with varying delay. In networks with significant delay variations, coded video frames can arrive late at the decoder and miss their respective display deadlines. Furthermore, due to predictive coding, a late frame can also prevent a number of subsequent frames from being displayed properly, where the number of affected frames or degree of distortion depends on the particular coding dependencies of the late frame. In this paper, we present an optimized video streaming strategy based on frame reordering for networks with significant delay variations. This streaming strategy minimizes distortion by exploiting the fact that different late frames result in different degrees of distortion. We model the router-induced delay in a wired network with an analytical PDF and we model the link-layer retransmission delay of a wireless network with the 3GPP specification for W-CDMA radio link control. We compute the distortion for different frame reorderings using the network delay models and a source model that accounts for the prediction dependencies of predictively coded video. Our optimized streaming strategies are shown to reduce the number of late frames by 14 to 23% for the situations examined.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"83 1","pages":"89-92 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80650185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive room acoustic rendering in real time","authors":"L. Savioja, T. Lokki, J. Huopaniemi","doi":"10.1109/ICME.2002.1035827","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035827","url":null,"abstract":"The goal of this paper is to give an overview of real-time room acoustic rendering. The approach is based on the source-medium-receiver model, in which we model sound sources, room acoustics, and a listener. The basic techniques for each of these are presented, but the main emphasis is on the room acoustic modeling and interactive auralization. As a case study we present the structure of the DIVA auralization system developed at the Helsinki University of Technology. In addition, we describe subjective evaluations made to our system. Finally, a discussion of some applications of virtual acoustics and their computational needs are given.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"90 1","pages":"497-500 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80680680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On model-based clustering of video scenes using scenelets","authors":"Hong Lu, Yap-Peng Tan","doi":"10.1109/ICME.2002.1035778","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035778","url":null,"abstract":"We propose in this paper a model-based approach to clustering video scenes based on scenelets. We define a video scenelet as a short consecutive sample of frames of a video sequence. The approach makes use of an unsupervised method to represent scenelets of a video with a concise Gaussian mixture model and cluster them into different video scenes according to their visual similarities. In particular the expectation-maximization algorithm is employed to estimate the unknown model parameters, and Bayesian information criterion is used to determine the optimal number and model of scene clusters in a principled manner. This approach is fundamentally different from many existing video clustering methods, as it does not require explicit knowledge of shot boundaries. Instead, the shot boundaries can also be obtained as a by-product of the scene clustering process. The proposed methods have been tested with various types of sports videos and promising results are reported in this paper.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"20 1","pages":"301-304 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82575496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Universal MPEG content access using compressed-domain system stream editing techniques","authors":"Ching-Yung Lin, Belle L. Tseng, John R. Smith","doi":"10.1109/ICME.2002.1035419","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035419","url":null,"abstract":"An MPEG system layer compressed-domain editing technique is proposed to facilitate the delivery and integration of multiple segments of MPEG files, residing on remote databases. Various multimedia applications, including retrieval and summarization, split MPEG files into small segments along shot boundaries and store them separately. This traditional method requires extra management and storage payload, provides only fixed segmentations, and may not be play smoothly. In order to solve this problem, our MPEG system-domain editing tool directly extracts video-audio information from the original MPEG sources and combines them to generate a single MPEG file. Manipulated wholly in the system bitstream domain, this method does not require decoding, re-encoding, and re-synchronization of audio and video data. Thus, it operates in real-time and provides great flexibility. This composite MPEG file can be transmitted and displayed through general Web interfaces. The proposed method is applied to our video retrieval, video summarization, and video editing systems, and has shown its great advantages.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"73-76 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82918977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cost-effective solution for eye-gaze assistive technology","authors":"Fulvio Corno, L. Farinetti, I. Signorile","doi":"10.1109/ICME.2002.1035632","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035632","url":null,"abstract":"The problem of assisting people with special needs is assuming a central role in our society, and information and communication technologies are asked to have a key role in aiding people with both physical and cognitive disabilities. This paper describes an eye tracking system, whose strong points are the simplicity and the consequent affordability of costs, designed and implemented to allow people with severe motor disabilities to use gaze as an input device for selecting areas on a computer screen. The motivation for this kind of input device, together with the communication impairments that it may help to solve are reported in the paper, that then describes the adopted technical solution, compared to existing approaches, and reports the results obtained by its experimentation.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"38 1","pages":"433-436 vol.2"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90222735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retrieval of articulate objects from images and video using invariant signatures","authors":"Ronald-Bryan O. Alferez, Yuan-fang Wang","doi":"10.1109/ICME.2002.1035757","DOIUrl":"https://doi.org/10.1109/ICME.2002.1035757","url":null,"abstract":"We propose a new method of retrieving multi-part, articulate objects from images and video. The scheme is particularly well suited for analyzing images and video for objects that can pose differently with possible shape deformation and articulated motion. The scheme involves computing an invariant signature for each segmented region in the image, in a manner that is insensitive to translation, rotation, scale, and shear. Using circular cross-correlation, these signatures can then be efficiently compared with that of user-defined regions of interest. Ambiguities between individual region matches are then resolved through relaxation labeling techniques. A final match is established when a collection of segmented regions conform to the query object, both in terms of local shape description and global structural relation. The scheme thus allows for articulated movement of object parts within the scene. The procedure is easy to implement, yet shows promising results in its ability to isolate interesting regions in images and video, to account for structural and relational constraints among regions, and to integrate both local shape and global structural information for a detailed examination of the scene in a way that is invariant to many visual variations.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"6 1","pages":"217-220 vol.1"},"PeriodicalIF":0.0,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89260330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}