Proceedings. Fourth IEEE International Conference on Multimodal Interfaces最新文献

筛选
英文 中文
Articulated model based people tracking using motion models 基于关节模型的人跟踪使用运动模型
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167025
Huazhong Ning, Liang Wang, Weiming Hu, T. Tan
{"title":"Articulated model based people tracking using motion models","authors":"Huazhong Ning, Liang Wang, Weiming Hu, T. Tan","doi":"10.1109/ICMI.2002.1167025","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167025","url":null,"abstract":"This paper focuses on acquisition of human motion data such as joint angles and velocity for applications of virtual reality, using both an articulated body model and a motion model in the CONDENSATION framework. Firstly, we learn a motion model represented by Gaussian distributions, and explore motion constraints by considering the dependency of motion parameters and represent them as conditional distributions. Both are integrated into the dynamic model to concentrate factored sampling in the areas of state-space with most posterior information. To measure the observing density with accuracy and robustness, a PEF (pose evaluation function) modeled with a radial term is proposed. We also address the issue of automatic acquisition of initial model posture and recovery from severe failures. A large number of experiments on several persons demonstrate that our approach works well.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121474667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Multi-modal translation system and its evaluation 多模态翻译系统及其评价
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167000
S. Morishima, Satoshi Nakamura
{"title":"Multi-modal translation system and its evaluation","authors":"S. Morishima, Satoshi Nakamura","doi":"10.1109/ICMI.2002.1167000","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167000","url":null,"abstract":"Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal natural communication, visual information such as face and lip movements will be necessary. We introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database. We conduct subjective evaluation tests using the connected digit discrimination test using data with and without audio-visual lip-synchronization. The results confirm the significant quality of the proposed audio-visual translation system and the importance of lip-synchronization.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132495733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition 基于产品hmm的多模态时间异步建模用于鲁棒视听语音识别
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167011
Satoshi Nakamura, K. Kumatani, S. Tamura
{"title":"Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition","authors":"Satoshi Nakamura, K. Kumatani, S. Tamura","doi":"10.1109/ICMI.2002.1167011","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167011","url":null,"abstract":"The demand for audio-visual speech recognition (AVSR) has increased in order to make speech recognition systems robust to acoustic noise. There are two kinds of research issue in audio-visual speech recognition, such as integration modeling considering asynchronicity between modalities and adaptive information weighting according information reliability. This paper proposes a method to effectively integrate audio and visual information. Such integration, inevitably, necessitates modeling the synchronization and asynchronization of audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on a family of a product HMM. The proposed model can represent state synchronicity not only within a phoneme, but also between phonemes. Furthermore, we also propose a rapid stream weight optimization based on the GPD algorithm for noisy, bimodal speech recognition. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech. When SNR=0 dB our proposed method attained 16% higher performance compared to a product HMM without synchronicity re-estimation.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"293 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133803450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Lip tracking for MPEG-4 facial animation 唇跟踪的MPEG-4面部动画
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167009
Zhilin Wu, Petar S. Aleksic, A. Katsaggelos
{"title":"Lip tracking for MPEG-4 facial animation","authors":"Zhilin Wu, Petar S. Aleksic, A. Katsaggelos","doi":"10.1109/ICMI.2002.1167009","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167009","url":null,"abstract":"It is very important to accurately track the mouth of a talking person for many applications, such as face recognition and human computer interaction. This is in general a difficult problem due to the complexity of shapes, colors, textures, and changing lighting conditions. We develop techniques for outer and inner lip tracking. From the tracking results FAPs are extracted which are used to drive an MPEG-4 decoder. A novel method consisting of a Gradient Vector Flow (GVF) snake with a parabolic template as an additional external force is proposed. Based on the results of the outer lip tracking, the inner lip is tracked using a similarity function and a temporal smoothness constraint. Numerical results are presented using the Bernstein database.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131077298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
A structural approach to distance rendering in personal auditory displays 个人听觉显示中距离渲染的结构方法
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166965
F. Fontana, D. Rocchesso, L. Ottaviani
{"title":"A structural approach to distance rendering in personal auditory displays","authors":"F. Fontana, D. Rocchesso, L. Ottaviani","doi":"10.1109/ICMI.2002.1166965","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166965","url":null,"abstract":"A virtual resonating environment aiming at enhancing our perception of distance is proposed. This environment reproduces the acoustics inside a tube, thus conveying peculiar distance cues to the listener. The corresponding resonator has been prototyped using a wave-based numerical scheme called waveguide mesh, that gave the necessary versatility to the model during the design and parameterization of the listening environment. Psychophysical tests show that this virtual environment conveys robust distance cues.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121037926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
3D N-best search for simultaneous recognition of distant-talking speech of multiple talkers 三维n -最佳搜索同时识别多说话人的远距离谈话语音
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166969
Satoshi Nakamura, P. Heracleous
{"title":"3D N-best search for simultaneous recognition of distant-talking speech of multiple talkers","authors":"Satoshi Nakamura, P. Heracleous","doi":"10.1109/ICMI.2002.1166969","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166969","url":null,"abstract":"A microphone array is a promising solution for realizing hands-free speech recognition in real environments. Accurate talker localization is very important for speech recognition using the microphone array. However, localization of a moving talker is difficult in noisy reverberant environments. Talker localization errors degrade the performance of speech recognition. To solve the problem, we proposed a new speech recognition algorithm which considers multiple talker direction hypotheses simultaneously. The proposed algorithm performs Viterbi search in 3-dimensional trellis space composed of talker directions, input frames, and HMM states. In this paper we describe a new simultaneous recognition algorithm for distant-talking speech of multiple talkers using the extended 3D N-best search algorithm. The algorithm exploits path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. We evaluated the proposed method using reverberated data, which are those simulated by the image method and recorded in a real room. The image method was used to find the accuracy-reverberation time relationship, and real data was used to evaluate the real performance of our algorithm. The Top 3 result of simultaneous word accuracy was 73.02% under 162 ms reverberation time using the image method.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121171002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hand tracking using spatial gesture modeling and visual feedback for a virtual DJ system 使用空间手势建模和视觉反馈的虚拟DJ系统的手跟踪
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166992
Edward C. Lin, A. Cassidy, Dan Hook, Avinash Baliga, Tsuhan Chen
{"title":"Hand tracking using spatial gesture modeling and visual feedback for a virtual DJ system","authors":"Edward C. Lin, A. Cassidy, Dan Hook, Avinash Baliga, Tsuhan Chen","doi":"10.1109/ICMI.2002.1166992","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166992","url":null,"abstract":"The ability to accurately track hand movement provides new opportunities for human computer interaction (HCI). Many of today's commercial hand tracking devices based on gloves can be cumbersome and expensive. An approach that avoids these problems is to use computer vision to capture hand motion. We present a complete real-time hand tracking and 3-D modeling system based on a single camera. In our system, we extract feature points from a video stream of a hand to control a virtual hand model with 2-D global motion and 3-D local motion. The on screen model gives the user instant feedback on the estimated position of the hand. This visual feedback allows a user to compensate for the errors in tracking. The system is used for three example applications. The first application uses hand tracking and gestures to take on the role of the mouse. The second interacts with a 3D virtual environment using the 3D hand model. The last application is a virtual DJ system that is controlled by hand motion tracking and gestures.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123888103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Context-based multimodal input understanding in conversational systems 会话系统中基于上下文的多模态输入理解
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166974
J. Chai, Shimei Pan, Michelle X. Zhou, K. Houck
{"title":"Context-based multimodal input understanding in conversational systems","authors":"J. Chai, Shimei Pan, Michelle X. Zhou, K. Houck","doi":"10.1109/ICMI.2002.1166974","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166974","url":null,"abstract":"In a multimodal human-machine conversation, user inputs are often abbreviated or imprecise. Sometimes, merely fusing multimodal inputs together cannot derive a complete understanding. To address these inadequacies, we are building a semantics-based multimodal interpretation framework called MIND (Multimodal Interpretation for Natural Dialog). The unique feature of MIND is the use of a variety of contexts (e.g., domain context and conversation context) to enhance multimodal fusion. In this paper we present a semantically rich modeling scheme and a context-based approach that enable MIND to gain a full understanding of user inputs, including ambiguous and incomplete ones.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123018706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Head-pose invariant facial expression recognition using convolutional neural networks 基于卷积神经网络的头部姿态不变面部表情识别
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167051
B. Fasel
{"title":"Head-pose invariant facial expression recognition using convolutional neural networks","authors":"B. Fasel","doi":"10.1109/ICMI.2002.1167051","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167051","url":null,"abstract":"Automatic face analysis has to cope with pose and lighting variations. Pose variations are particularly difficult to tackle and many face analysis methods require the use of sophisticated normalization and initialization procedures. We propose a data-driven face analysis approach that is not only capable of extracting features relevant to a given face analysis task, but is also more robust with regard to face location changes and scale variations when compared to classical methods such as MLPs. Our approach is based on convolutional neural networks that use multi-scale feature extractors, which allow for improved facial expression recognition results with faces subject to in-plane pose variations.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131465587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Mobile multi-modal data services for GPRS phones and beyond 移动多模态数据服务的GPRS电话和超越
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167018
Georg Niklfeld, Michael Pucher, R. Finan, W. Eckhart
{"title":"Mobile multi-modal data services for GPRS phones and beyond","authors":"Georg Niklfeld, Michael Pucher, R. Finan, W. Eckhart","doi":"10.1109/ICMI.2002.1167018","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167018","url":null,"abstract":"The paper discusses means to build multi-modal data services in existing GPRS infrastructures, and puts the proposed simple solutions into the perspective of technological possibilities that will become available in public mobile communications networks over the next few years along the progression path from 2G/GSM systems, through GPRS, to 3G systems like UMTS, or equivalently to 802.11 networks. Three demonstrators are presented, which were developed by the authors in an application-oriented research project co-financed by telecommunications companies. The first two, push-to-talk address entry for a route-finder, and an open-microphone map-content navigator simulate a UMTS or WLAN scenario. The third demonstrator implements a multi-modal map finder in a live public GPRS network using WAP-Push. Indications of usability are given. The paper argues for the importance of open, standards-based architectures that will spur attractive multi-modal services in the short term, as current economic difficulties in the telecommunications industry put support for long term research into more advanced forms of multi-modality in question.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126982946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信