{"title":"Automatic relevance feedback for video retrieval","authors":"P. Muneesawang, L. Guan","doi":"10.1109/ICME.2003.1221631","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221631","url":null,"abstract":"This paper presents an automatic relevance feedback method for improving retrieval accuracy in video database. We first demonstrate a representation based on a template-frequency model (TFM) that allows the full use of the temporal dimension. We then integrate the TFM with a self-training neural network structure to adaptively capture different degrees of visual importance in a video sequence. Forward and backward signal propagation is the key in this automatic relevance feedback method in order to enhance retrieval accuracy.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131375255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic principal video shot classification via mixture Gaussian","authors":"Hangzai Luo, Jianping Fan, Jing Xiao, Xingquan Zhu","doi":"10.1109/ICME.2003.1221585","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221585","url":null,"abstract":"As digital cameras become more affordable, digital video now plays an important role in medical education and healthcare. In this paper, we propose a novel framework to facilitate semantic classification of surgery education videos. Specifically, the framework includes: (a) semantic-sensitive video content characterization via principal video shots, (b) semantic video classification via a mixture Gaussian model to bridge the semantic gap between low-level visual features and semantic visual concepts in a specific surgery education video domain.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132387837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Panu Hämäläinen, Marko Hännikäinen, T. Hämäläinen, Riku Soininen
{"title":"Offline architecture for real-time betting","authors":"Panu Hämäläinen, Marko Hännikäinen, T. Hämäläinen, Riku Soininen","doi":"10.1109/ICME.2003.1221016","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221016","url":null,"abstract":"Traditional betting systems do not enable betting in real-time during an event and require decisions and preparations beforehand. In this paper a novel architecture for real-time betting is presented. It disposes of the up-front effort and enables frequent bet announcements and placements during an ongoing event. The novel situation is achieved by broadcasting announcements, time-stamping and storing the placements locally, and collecting them after the event has been finished. While solving the processing problems, the architecture requires reliable cryptographic and physical protection. Currently, e.g. DVB and LAN technologies offer potential platforms for providing the service. The implemented LAN demonstrator has shown that user interfaces have to be simple and bets should not be announced too often. It has also shown that real-time operation makes betting more inspiring.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130072037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A test tool to support brute-force online and offline signature forgery tests on mobile devices","authors":"Frank Zoebisch, C. Vielhauer","doi":"10.1109/ICME.2003.1221289","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221289","url":null,"abstract":"Testing of biometric systems requires the consideration of aspects beyond technical and statistical parameters. Especially for testing biometric techniques based on behavior, human factors like intention and forgery strength need to be considered. In this paper, a test tool to support skilled forgeries by test subjects is presented for handwriting verification systems. The software tool has been implemented on two computer platforms and is based on a three level forgery quality model. First experimental results are presented, which indicate that by applying the presented system in attack tests, forgeries of gradual quality can be obtained from test persons.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130109800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Documenting life: videography and common sense","authors":"Barbara Barry, G. Davenport","doi":"10.1109/ICME.2003.1221587","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221587","url":null,"abstract":"This paper introduces a model for producing common sense metadata during video capture and describes how this technique can have a positive impact on content capture, representation, and presentation. Metadata entered into the system at the moment of capture is used to generate suggestions designed to help the videographer decide what to shoot, how to compose a shot and how to index their video material to best support their communication requirements. An approach and first experiments using a common sense database and reasoning techniques to support a partnership between the camera and videographer during video capture are presented.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130285766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Ramkishor, T. S. Raghu, K. Suman, Pallapothu S. S. B. K. Gupta
{"title":"Spatial correlation based fast field motion vector estimation algorithm for interlaced video encoding","authors":"K. Ramkishor, T. S. Raghu, K. Suman, Pallapothu S. S. B. K. Gupta","doi":"10.1109/ICME.2003.1221737","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221737","url":null,"abstract":"A fast motion vector estimation algorithm for field motion estimation having computational complexity independent of search range is presented for interlaced video encoding. The algorithm is based on spatial correlation of motion vectors and yields good tradeoff between motion estimation distortion and number of SAD computations. The speed up achieved by the algorithm is in the order of 190-280 times and the SAD increase is 2-10% compared to full search. The number of bits required to code motion vectors is less as the algorithm is based on correlation of motion vectors. Overall performance of the algorithm is compared with full search and three-step search using speed-up, average SAD per macroblock (MB), motion vector bits per MB and PSNR as objective measure. The proposed algorithm can be used in standards such as MPEG-2, MPEG-4 ASP, etc. The proposed algorithm is implemented in our MPEG-2 encoder and performance of the algorithm is presented for the same.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134131927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Zotkin, S. Shamma, Powen Ru, R. Duraiswami, L. Davis
{"title":"Pitch and timbre manipulations using cortical representation of sound","authors":"D. Zotkin, S. Shamma, Powen Ru, R. Duraiswami, L. Davis","doi":"10.1109/ICME.2003.1221328","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221328","url":null,"abstract":"The sound receiver at the ears is processed by humans using signal processing that separate the signal along intensity, pitch and timbre dimensions. Conventional Fourier-based signal processing, while endowed with fast algorithms, is unable to easily represent signal along these attributes. In this paper we use a cortical representation to represent the manipulate sound. We briefly overview algorithms for obtaining, manipulating and inverting cortical representation of sound and describe algorithms for manipulating signal pitch and timbre separately. The algorithms are first used to create sound of an instrument between a guitar and a trumpet. Applications to creating maximally separable sounds in auditory user interfaces are discussed.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134540230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstruction of linearly parameterized models using the vanishing points from a single image","authors":"Yong-In Yoon, J. Im, Dae-Hyun Kim, Jongsoo Choi","doi":"10.1109/ICME.2003.1220899","DOIUrl":"https://doi.org/10.1109/ICME.2003.1220899","url":null,"abstract":"In this paper, we propose a new method using only three vanishing points to recover the dimensions of object and its pose from a single image with a camera of unknown focal length. Our approach is to compute the dimensions of objects represented by the unit vector of objects from an image. The dimension vector v can be solved by the standard nonlinear optimization techniques with a multistart method which generates multiple starting points for the optimizer by sampling the parameter space uniformly. This method allows model-based vision to be computed for the dimensions of object for a 3D model from matches to a single 2D image. Experimental results show the actual dimensions of object from an image agree well with the calculated results.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"329 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134366722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hidden Markov model-based speech emotion recognition","authors":"Björn Schuller, G. Rigoll, M. Lang","doi":"10.1109/ICME.2003.1220939","DOIUrl":"https://doi.org/10.1109/ICME.2003.1220939","url":null,"abstract":"In this contribution we introduce speech emotion recognition by use of continuous hidden Markov models. Two methods are propagated and compared throughout the paper. Within the first method a global statistics framework of an utterance is classified by Gaussian mixture models using derived features of the raw pitch and energy contour of the speech signal. A second method introduces increased temporal complexity applying continuous hidden Markov models considering several states using low-level instantaneous features instead of global statistics. The paper addresses the design of working recognition engines and results achieved with respect to the alluded alternatives. A speech corpus consisting of acted and spontaneous emotion samples in German and English language is described in detail. Both engines have been tested and trained using this equivalent speech corpus. Results in recognition of seven discrete emotions exceeded 86% recognition rate. As a basis of comparison the similar judgment of human deciders classifying the same corpus at 79.8% recognition rate was analyzed.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131534468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic 3D face verification from range data","authors":"Gang Pan, Zhaohui Wu, Yunhe Pan","doi":"10.1109/ICME.2003.1221266","DOIUrl":"https://doi.org/10.1109/ICME.2003.1221266","url":null,"abstract":"In this paper, we presented a novel approach for automatic 3D face verification from range data. The method consists of range data registration and comparison. There are two steps in registration procedure: the coarse step conducting the normalization by exploiting a priori knowledge of the human face and facial features, and the fine step aligning the input data with the model stored in the database by the partial directed Hausdorff distance. To speed up the registration, a simplified version of the model is generated for each model in the model database. During the face comparison, the partial Hausdorff distance is employed as the similarity metric. The experiments are carried out on a database with 30 individuals and the best EER of 3.24% is achieved.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131579308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}