{"title":"Noise robust hands-free speech recognition using microphone array and Kalman filter as front-end system of conversational TV","authors":"M. Fujimoto, Y. Ariki","doi":"10.1109/MMSP.2002.1203297","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203297","url":null,"abstract":"In this paper, we investigate hands-free speech recognition as front-end system of conversational TV. The conversational TV is one of machine conversation systems to retrieve the interesting information by inquiring it to the TV. To realize the natural machine conversation without consciousness of microphone, hands-free speech recognition is required. In the hands-free speech recognition system, the directions of the arriving signal are estimated by using a microphone array and the desired signal is enhanced by beam forming. Then, the user utterance section is detected automatically from continuously observed signal. Furthermore, by applying the noise reduction and noise adaptation, the enhanced speech signal is recognized accurately.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124724529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video retrieval using an adaptive video indexing technique and automatic relevance feedback","authors":"P. Muneesawang, L. Guan","doi":"10.1109/MMSP.2002.1203286","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203286","url":null,"abstract":"This work demonstrates content-based retrieval techniques for video databases using an adaptive video indexing (AVI) and a neural network model. The AVI utilizes a \"template frequency model\" for embedding spatial-temporal contents which are a key in characterizing the time-varying nature of video. This model can naturally be adopted to characterize video at various levels from shot, group, and story levels, in order to facilitate a multiple-level access video database. The AVI retrieval system achieves excellent retrieval accuracy, substantially higher than that of the key-frame based video indexing (KFVI), a popular benchmark for video retrieval. Furthermore, AVI structure can be integrated to a specialized neural network model to perform automatic relevance feedback retrieval. This offers advantages both in minimizing human-user involvement, and in considerably enhancing retrieval accuracy in the context of adaptive systems.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127779338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wide baseline image registration using prior information","authors":"A. Roy-Chowdhury, R. Chellappa, T. Keaton","doi":"10.1109/MMSP.2002.1203242","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203242","url":null,"abstract":"Establishing correspondence between features in two images of the same scene taken from different viewing angles in a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, 3D model alignment, creation of panoramic views etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching 2D shapes of the different features of the face. A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellations of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121039168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haruto Takeda, N. Saito, Tomoshi Otsuki, M. Nakai, H. Shimodaira, S. Sagayama
{"title":"Hidden Markov model for automatic transcription of MIDI signals","authors":"Haruto Takeda, N. Saito, Tomoshi Otsuki, M. Nakai, H. Shimodaira, S. Sagayama","doi":"10.1109/MMSP.2002.1203337","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203337","url":null,"abstract":"This paper describes a Hidden Markov Model (HMM)-based method of automatic transcription of MIDI (Musical Instrument Digital Interface) signals of performed music. The problem is formulated as recognition of a given sequence of fluctuating note durations to find the most likely intended note sequence utilizing the modern continuous speech recognition technique. Combining a stochastic model of deviating note durations and a stochastic grammar representing possible sequences of notes, the maximum likelihood estimate of the note sequence is searched in terms of Viterbi algorithm. The same principle is successfully applied to a joint problem of bar line allocation, time measure recognition, and tempo estimation. Finally, durations of consecutive /spl eta/n notes are combined to form a \"rhythm vector\" representing tempo-free relative durations of the notes and treated in the same framework. Significant improvements compared with conventional \"quantization\" techniques are shown.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Musical query-by-description as a multiclass learning problem","authors":"B. Whitman, R. Rifkin","doi":"10.1109/MMSP.2002.1203270","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203270","url":null,"abstract":"We present the query-by-description (QBD) component of \"Kandem\", a time-aware music retrieval system. The QBD system we describe learns a relation between descriptive text concerning a musical artist and their actual acoustic output, making such queries as \"Play me something loud with an electronic beat\" possible by merely analyzing the audio content of a database. We show a novel machine learning technique based on regularized least-squares classification (RLSC) that can quickly and efficiently learn the non-linear relation between descriptive language and audio features by treating the problem as a large number of possible output classes linked to the same set or input features. We show how the RLSC training can easily eliminate irrelevant labels.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130825233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Eyeball Video Communications Platform","authors":"J. Vass, Shahadat Khan","doi":"10.1109/MMSP.2002.1203329","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203329","url":null,"abstract":"Eyeball Video Communications Platform (VCP) provides a comprehensive solution for video communications, instant messaging, remote collaboration and application development. Eyeball VCP supports one-to-one and many-to-many video communications and collaboration utilizing peer-to-peer data transport without employing any reflector service. This structure is not only cost effective but also provides minimal delay. Eyeball VCP is based on two key technologies: Eyeball Any-Bandwidth Technology and Eyeball Any-Firewall Technology . Eyeball Any-Bandwidth Technology guarantees the best possible audio-video quality for broadband, narrowband and wireless connections. Eyeball Any-Firewall Technology ensures that media can pass through both corporate and personal firewalls with minimal configuration without compromising security. Eyeball VCP is targeted for the following markets: application developers, Internet and communications service providers, and the medium and large enterprises. Eyeball VCP won several industry awards including Best of Show Award in Internet World, Product of the Year from the Communications ASP Magazine and The Editor's Choice Award from the Internet Telephony Magazine.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123807932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video quality objective metric using data hiding","authors":"Mylène C. Q. Farias, S. Mitra, M. Carli","doi":"10.1109/MMSP.2002.1203346","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203346","url":null,"abstract":"In this paper a non-reference objective video quality metric is proposed. The quality metric is obtained by means of a non-conventional use of data hiding technique. Test data are embedded in an MPEG-2 video; the basic assumption is that the data embedded undergo under the same degradation as the host video. To analyze the performance of the system, a comparison between the results obtained using this metric and the perceived mean annoyance values was performed. The annoyance values were obtained through a psychophysical experiment, which measured the threshold and mean annoyance values of compressed videos.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"113 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121042272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
María Blanca Caminero, C. Carrión, F. Quiles, J. Duato, S. Yalamanchili
{"title":"A new switch scheduling algorithm to improve QoS in the multimedia router","authors":"María Blanca Caminero, C. Carrión, F. Quiles, J. Duato, S. Yalamanchili","doi":"10.1109/MMSP.2002.1203324","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203324","url":null,"abstract":"The multimedia router (MMR) is aimed at providing QoS to multimedia flows, which coexist with conventional best-effort traffic, by means of a single-chip, compact router designed for cluster and local area environments. As the router is based on a multiplexed crossbar, hardware efficient link and switch scheduling algorithms are needed. Their goal is to achieve a high utilization, while the QoS needed by the multimedia connections is guaranteed. This work presents a novel switch scheduling algorithm, the candidate conflict arbiter (CCA), that can be efficiently implemented in the MMR. Simulation results show that this proposal beats other previous algorithms in terms of maximum throughput achieved while still providing QoS to the multimedia flows.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121217614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining stereo and visual hull information for on-line reconstruction and rendering of dynamic scenes","authors":"Ming Li, H. Schirmacher, M. Magnor, H. Seidel","doi":"10.1109/MMSP.2002.1203235","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203235","url":null,"abstract":"In this paper, we present a novel system which, combines depth-from-stereo and visual hull reconstruction for acquiring dynamic real-world scenes at interactive rates. First, we use the silhouettes from multiple views to construct a polyhedral visual hull is then used to limit the disparity range during depth-from-stereo computation. The restricted search range improves both speed and quality of the stereo reconstruction. In return, stereo information can compensate for some of the visual hull method, such as inability to reconstruct surface details and concave regions. Our system achieves a reconstruction frame rate of 4fps.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128684668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An edge and texture preserving algorithm for video error concealment","authors":"S. Belfiore, Marco Grangetto, E. Magli, G. Olmo","doi":"10.1109/MMSP.2002.1203263","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203263","url":null,"abstract":"We present a novel error concealment algorithm for block-based video transmission over error-prone networks. We develop a spatial error concealment technique, which combines edge-preserving interpolation and texture analysis and synthesis, providing a reconstruction of lost macroblocks optimized for visual perception. In particular, the algorithm recovers image edges by coarse-to-fine MAP estimation with a Markov random field prior, and replenishes lost textured areas with a texture synthesized from neighboring macroblocks. Experimental results show that texture synthesis allows achieving improved visual quality of the reconstructed area with respect to other state-of-the-art spatial concealment techniques.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126986143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}