2002 IEEE Workshop on Multimedia Signal Processing.最新文献

筛选
英文 中文
Similarity matching of continuous melody contours for humming querying of melody databases 连续旋律轮廓相似度匹配的旋律数据库哼唱查询
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203293
Yongwei Zhu, M. Kankanhalli, Q. Tian
{"title":"Similarity matching of continuous melody contours for humming querying of melody databases","authors":"Yongwei Zhu, M. Kankanhalli, Q. Tian","doi":"10.1109/MMSP.2002.1203293","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203293","url":null,"abstract":"Music query-by-humming is a challenging problem since the humming query inevitably contains much variation and inaccuracy. In this paper, we present a novel melody similarity matching technique, which is based on continuous melody contour. We introduce a contour alignment technique, which addresses the robustness and efficiency issues. We also present a new melody similarity metric, which is performed directly on continuous melody contours of the query data. This approach cleanly separates the alignment and similarity measurement in the retrieval process. Our melody alignment method can reduce the matching candidate to 1.7% with 90% correct alignment rate. The overall retrieval system achieved 88% correct retrieval in the top 20 rank lists.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123444745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
On the capacity of the reachback channel in wireless sensor networks 无线传感器网络中回传信道的容量研究
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203332
J. Barros, S. Servetto
{"title":"On the capacity of the reachback channel in wireless sensor networks","authors":"J. Barros, S. Servetto","doi":"10.1109/MMSP.2002.1203332","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203332","url":null,"abstract":"We consider the problem of reachback communication in wireless sensor networks: multiple sensors are deployed on a field, and they collect local measurements of some random process which then need to be encoded and reproduced at a remote location. In this paper we present a number of information theoretic bounds on the performance of a distributed transmission array that is formed by a large number of cheap, unreliable sensors. We formulate this problem in terms of classical network information theory concepts, formulation which leads us to consider two important cases: transmission of correlated sources over multiple independent channels, and rate/distortion with separate encoders.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131917258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
A speech-centric perspective for human-computer interface 以语音为中心的人机界面视角
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203296
L. Deng, A. Acero, Ye-Yi Wang, Kuansan Wang, H. Hon, J. Droppo, M. Mahajan, Xuedong Huang
{"title":"A speech-centric perspective for human-computer interface","authors":"L. Deng, A. Acero, Ye-Yi Wang, Kuansan Wang, H. Hon, J. Droppo, M. Mahajan, Xuedong Huang","doi":"10.1109/MMSP.2002.1203296","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203296","url":null,"abstract":"Speech technology has been playing a central role in enhancing human-machine interactions, especially for small devices for which GUI has obvious limitations. The speech-centric perspective for human-computer interface advanced in this paper derives from the view that speech is the only natural and expressive modality to enable people to access information from and to interact with any device. In this paper, we describe the work conducted at Microsoft Research, in the project codenamed Dr.Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present MiPad as the first Dr.Who's application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today's PDAs or smart phones.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"71 27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116558024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Context based coding of quantized alpha planes for video objects 基于上下文的视频对象量化alpha平面编码
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203258
S. M. Aghito, Søren Forchhammer
{"title":"Context based coding of quantized alpha planes for video objects","authors":"S. M. Aghito, Søren Forchhammer","doi":"10.1109/MMSP.2002.1203258","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203258","url":null,"abstract":"In object based video, each frame is a composition of objects that are coded separately. The composition is performed through the alpha plane that represents the transparency of the object. We present an alternative to MPEG-4 for coding of alpha planes that considers their specific properties. Comparisons in terms of rate and distortion are provided, showing that the proposed coding scheme for still alpha planes is better than the algorithms for I-frames used in MPEG-4.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120959524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An open source development tool for anthropomorphic dialog agent: face image synthesis and lip synchronization 拟人化对话代理的开源开发工具:人脸图像合成和唇同步
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203298
T. Yotsukura, S. Morishima
{"title":"An open source development tool for anthropomorphic dialog agent: face image synthesis and lip synchronization","authors":"T. Yotsukura, S. Morishima","doi":"10.1109/MMSP.2002.1203298","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203298","url":null,"abstract":"We describe the design and report the development of an open source ware toolkit for building an easily customizable anthropomorphic dialog agent. This toolkit consists of four modules for multi-modal dialog integration, speech recognition, speech synthesis, and face image synthesis. In this paper, we focus on the construction of an agent's face image synthesis.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131210136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Recent progress in spontaneous speech recognition and understanding 自发语音识别与理解的最新进展
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203294
S. Furui
{"title":"Recent progress in spontaneous speech recognition and understanding","authors":"S. Furui","doi":"10.1109/MMSP.2002.1203294","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203294","url":null,"abstract":"How to recognize and understand spontaneous speech is one of the most important issues in state-of-the-art speech recognition technology. In this context, a five-year large scale national project entitled \"Spontaneous speech: corpus and processing technology\" started in Japan in 1999. This paper gives an overview of the project and reports on the major results of experiments that have been conducted so far at Tokyo Institute of Technology, including spontaneous presentation speech recognition, automatic speech summarization, and message-driven speech recognition. The paper also discusses the most important research problems to be solved in order to achieve ultimate spontaneous speech recognition systems.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128280509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Design-Trotter: a multimedia embedded systems design space exploration tool design - trotter:一个多媒体嵌入式系统设计空间探索工具
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203342
Y. Moullec, J. Diguet, J. Philippe
{"title":"Design-Trotter: a multimedia embedded systems design space exploration tool","authors":"Y. Moullec, J. Diguet, J. Philippe","doi":"10.1109/MMSP.2002.1203342","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203342","url":null,"abstract":"In this paper we present the intra-function dynamic estimation step of our system-level design space exploration tool. The aim of our global methodology is to fill the gap between system specification and the tasks of the system design flow to converge towards an efficient system on chip architecture for multimedia applications. In this context, the intra-function estimation step rapidly provides for each functional block of the specification, trade-off curves which represent a large set of parallelism options for both data-transfer and processing resources. A set of methods used to achieve this estimation process is detailed.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A ranking technique for fast audio identification 一种快速音频识别的排序技术
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203278
F. Kurth
{"title":"A ranking technique for fast audio identification","authors":"F. Kurth","doi":"10.1109/MMSP.2002.1203278","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203278","url":null,"abstract":"We introduce a novel ranking technique for fast and robust audio identification. In this approach, identification is performed by evaluating certain parts of correlation sequences. Starting form a general framework for content-based audio identification, we show to integrate our new algorithm into a fast index-based search algorithm. We demonstrate the capabilities of our approach by considering the tasks of identifying highly distorted audio material and search fragments in large scale MP3 data bases.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122313557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A closed-form solution to the autocorrelation matching method for wireless MIMO communications 无线MIMO通信中自相关匹配方法的封闭解
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203328
Hui Luo, L. Luo, Ruey-Wen Liu
{"title":"A closed-form solution to the autocorrelation matching method for wireless MIMO communications","authors":"Hui Luo, L. Luo, Ruey-Wen Liu","doi":"10.1109/MMSP.2002.1203328","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203328","url":null,"abstract":"Wireless MIMO (multiple input multiple output) communication techniques are proposed to boost spectrum efficiency using antenna arrays so that broadband data such as multimedia signals can be transmitted over a limited bandwidth. The AM (autocorrelation matching) method is a SOS (second-order statistics) based blind MIMO-FIR equalization technique that may enable wireless MIMO communications without identifying MIMO-FIR channels. This paper presents a closed-form solution to computing the optimal zero-forcing equalizer for the AM method using the knowledge of second-order statistics of the transmitted signals and received signals. Numerical simulations are given to show the effectiveness of the AM method and the performance of the closed-form solution.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124676748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized video summary using visual semantic annotations and automatic speech transcriptions 使用视觉语义注释和自动语音转录的个性化视频摘要
2002 IEEE Workshop on Multimedia Signal Processing. Pub Date : 2002-12-09 DOI: 10.1109/MMSP.2002.1203234
Belle L. Tseng, Ching-Yung Lin
{"title":"Personalized video summary using visual semantic annotations and automatic speech transcriptions","authors":"Belle L. Tseng, Ching-Yung Lin","doi":"10.1109/MMSP.2002.1203234","DOIUrl":"https://doi.org/10.1109/MMSP.2002.1203234","url":null,"abstract":"A personalized video summary is dynamically generated in our video personalization and summary system based on user preference and usage environment. The three-tier personalization system adopts the server-middleware-client architecture in order maintain, select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. In this paper, the metadata includes visual semantic annotations and automatic speech transcriptions. Our personalization and summarization engine in the middleware selects the optimal set of desired video segments by matching shot annotations and sentence transcripts with user preferences. The process includes the shot-to-sentence alignment, summary segment selection, and user preference matching and propagation. As a result, the relevant visual shot and audio sentence segments are aggregated and composed into a personalized video summary.","PeriodicalId":398813,"journal":{"name":"2002 IEEE Workshop on Multimedia Signal Processing.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128977936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信