Proceedings. Fourth IEEE International Conference on Multimodal Interfaces最新文献

Articulated model based people tracking using motion models 基于关节模型的人跟踪使用运动模型

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167025

Huazhong Ning, Liang Wang, Weiming Hu, T. Tan

引用次数: 27

Multi-modal translation system and its evaluation 多模态翻译系统及其评价

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167000

S. Morishima, Satoshi Nakamura

引用次数: 2

Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition 基于产品hmm的多模态时间异步建模用于鲁棒视听语音识别

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167011

Satoshi Nakamura, K. Kumatani, S. Tamura

{"title":"Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition","authors":"Satoshi Nakamura, K. Kumatani, S. Tamura","doi":"10.1109/ICMI.2002.1167011","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167011","url":null,"abstract":"The demand for audio-visual speech recognition (AVSR) has increased in order to make speech recognition systems robust to acoustic noise. There are two kinds of research issue in audio-visual speech recognition, such as integration modeling considering asynchronicity between modalities and adaptive information weighting according information reliability. This paper proposes a method to effectively integrate audio and visual information. Such integration, inevitably, necessitates modeling the synchronization and asynchronization of audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of a product HMM. The proposed model can represent state synchronicity not only within a phoneme, but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on the GPD algorithm for noisy, bimodal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. When SNR=0 dB our proposed method attained 16% higher performance compared to a product HMM without synchronicity re-estimation.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"293 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133803450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Lip tracking for MPEG-4 facial animation 唇跟踪的MPEG-4面部动画

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167009

Zhilin Wu, Petar S. Aleksic, A. Katsaggelos

引用次数: 35

A structural approach to distance rendering in personal auditory displays 个人听觉显示中距离渲染的结构方法

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166965

F. Fontana, D. Rocchesso, L. Ottaviani

引用次数: 13

3D N-best search for simultaneous recognition of distant-talking speech of multiple talkers 三维n -最佳搜索同时识别多说话人的远距离谈话语音

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166969

Satoshi Nakamura, P. Heracleous

{"title":"3D N-best search for simultaneous recognition of distant-talking speech of multiple talkers","authors":"Satoshi Nakamura, P. Heracleous","doi":"10.1109/ICMI.2002.1166969","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166969","url":null,"abstract":"A microphone array is a promising solution for realizing hands-free speech recognition in real environments. Accurate talker localization is very important for speech recognition using the microphone array. However, localization of a moving talker is difficult in noisy reverberant environments. Talker localization errors degrade the performance of speech recognition. To solve the problem, we proposed a new speech recognition algorithm which considers multiple talker direction hypotheses simultaneously. The proposed algorithm performs Viterbi search in 3-dimensional trellis space composed of talker directions, input frames, and HMM states. In this paper we describe a new simultaneous recognition algorithm for distant-talking speech of multiple talkers using the extended 3D N-best search algorithm. The algorithm exploits path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. We evaluated the proposed method using reverberated data, which are those simulated by the image method and recorded in a real room. The image method was used to find the accuracy-reverberation time relationship, and real data was used to evaluate the real performance of our algorithm. The Top 3 result of simultaneous word accuracy was 73.02% under 162 ms reverberation time using the image method.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121171002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Hand tracking using spatial gesture modeling and visual feedback for a virtual DJ system 使用空间手势建模和视觉反馈的虚拟DJ系统的手跟踪

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166992

Edward C. Lin, A. Cassidy, Dan Hook, Avinash Baliga, Tsuhan Chen

引用次数: 24

Context-based multimodal input understanding in conversational systems 会话系统中基于上下文的多模态输入理解

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166974

J. Chai, Shimei Pan, Michelle X. Zhou, K. Houck

引用次数: 20

Head-pose invariant facial expression recognition using convolutional neural networks 基于卷积神经网络的头部姿态不变面部表情识别

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167051

B. Fasel

引用次数: 55

Mobile multi-modal data services for GPRS phones and beyond 移动多模态数据服务的GPRS电话和超越

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167018

Georg Niklfeld, Michael Pucher, R. Finan, W. Eckhart

{"title":"Mobile multi-modal data services for GPRS phones and beyond","authors":"Georg Niklfeld, Michael Pucher, R. Finan, W. Eckhart","doi":"10.1109/ICMI.2002.1167018","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167018","url":null,"abstract":"The paper discusses means to build multi-modal data services in existing GPRS infrastructures, and puts the proposed simple solutions into the perspective of technological possibilities that will become available in public mobile communications networks over the next few years along the progression path from 2G/GSM systems, through GPRS, to 3G systems like UMTS, or equivalently to 802.11 networks. Three demonstrators are presented, which were developed by the authors in an application-oriented research project co-financed by telecommunications companies. The first two, push-to-talk address entry for a route-finder, and an open-microphone map-content navigator simulate a UMTS or WLAN scenario. The third demonstrator implements a multi-modal map finder in a live public GPRS network using WAP-Push. Indications of usability are given. The paper argues for the importance of open, standards-based architectures that will spur attractive multi-modal services in the short term, as current economic difficulties in the telecommunications industry put support for long term research into more advanced forms of multi-modality in question.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126982946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6