Proceedings. Fourth IEEE International Conference on Multimodal Interfaces最新文献

筛选
英文 中文
Embarking on multimodal interface design 着手多模式界面设计
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167021
A. K. Sinha, J. Landay
{"title":"Embarking on multimodal interface design","authors":"A. K. Sinha, J. Landay","doi":"10.1109/ICMI.2002.1167021","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167021","url":null,"abstract":"Designers are increasingly faced with the challenge of targeting multimodal applications, those that span heterogeneous devices and use multimodal input, but do not have tools to support them. We studied the early stage work practices of professional multimodal interaction designers. We noted the variety of different artifacts produced, such as design sketches and paper prototypes. Additionally, we observed Wizard of Oz techniques that are sometimes used to simulate an interactive application from these sketches. These studies have led to our development of a technique for interface designers to consider as they embark on creating multimodal applications.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133416308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Multimodal contextual car-driver interface 多模式上下文汽车驾驶员界面
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167023
D. Siewiorek, A. Smailagic, M. Hornyak
{"title":"Multimodal contextual car-driver interface","authors":"D. Siewiorek, A. Smailagic, M. Hornyak","doi":"10.1109/ICMI.2002.1167023","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167023","url":null,"abstract":"This paper focuses on the design and implementation of a companion contextual car driver interface that proactively assists the driver in managing information and communication. The prototype combines a smart car environment and driver state monitoring, incorporating a wide range of input-output modalities and a display hierarchy. Intelligent agents link information from many contexts, such as location and schedule, and transparently learn from the driver, interacting with the driver only when it is necessary.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121920401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Evaluating integrated speech- and image understanding 评估综合语音和图像理解
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166961
C. Bauckhage, J. Fritsch, K. Rohlfing, S. Wachsmuth, G. Sagerer
{"title":"Evaluating integrated speech- and image understanding","authors":"C. Bauckhage, J. Fritsch, K. Rohlfing, S. Wachsmuth, G. Sagerer","doi":"10.1109/ICMI.2002.1166961","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166961","url":null,"abstract":"The capability to coordinate and interrelate speech and vision is a virtual prerequisite for adaptive, cooperative, and flexible interaction among people. It is therefore fair to assume that human-machine interaction, too, would benefit from intelligent interfaces for integrated speech and image processing. We first sketch an interactive system that integrates automatic speech processing with image understanding. Then, we concentrate on performance assessment which we believe is an emerging key issue in multimodal interaction. We explain the benefit of time scale analysis and usability studies and evaluate our system accordingly.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125814957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Modular approach of multimodal integration in a virtual environment 虚拟环境中多模态集成的模块化方法
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167017
Rajarathinam Arangarasan, George N. Phillips
{"title":"Modular approach of multimodal integration in a virtual environment","authors":"Rajarathinam Arangarasan, George N. Phillips","doi":"10.1109/ICMI.2002.1167017","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167017","url":null,"abstract":"We present a novel modular approach to integrating multiple input/output (I/O) modes in a virtual environment that imitate natural, intuitive and effective human interaction behavior. The I/O modes used in this research are spatial tracking of both hands, finger gesture recognition, head/body spatial tracking, voice recognition (discrete recognition for simple commands, and continuous recognition for natural language input), immersive stereo display and synthesized speech output. Intuitive natural interaction is achieved through several stages: identifying all the tasks that need to be performed, grouping similar tasks and assigning them to a particular mode such that it imitates the physical world. This modular approach allows inclusion and removal of additional input and output modes as well as additional users. We described this multimodal interaction paradigm by applying it to a real world application: visualizing, modeling and fitting protein molecular structures in an immersive virtual environment.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126004957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A multimodal electronic travel aid device 一种多模式电子旅行辅助装置
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166966
Andrea Fusiello, Antonello Panuccio, Vittorio Murino, F. Fontana, D. Rocchesso
{"title":"A multimodal electronic travel aid device","authors":"Andrea Fusiello, Antonello Panuccio, Vittorio Murino, F. Fontana, D. Rocchesso","doi":"10.1109/ICMI.2002.1166966","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166966","url":null,"abstract":"This paper describes an electronic travel aid device, that may enable blind individuals to \"see the world with their ears\". A wearable prototype will be assembled using low-cost hardware: earphones, sunglasses fitted with two micro cameras, and a palmtop computer. The system, which currently runs on a desktop computer, is able to detect the light spot produced by a laser pointer, compute its angular position and depth, and generate a corresponding sound providing auditory cues for perception of the position and distance of the pointed surface patch. It permits different sonification modes that can be chosen by drawing, with the laser pointer, a predefined stroke which will be recognized by a hidden Markov model. In this way a blind person can use a common pointer as a replacement for the cane and will interact with the device using a flexible and natural sketch based interface.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129567041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Integrating emotional cues into a framework for dialogue management 将情感线索整合到对话管理框架中
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166983
H. Holzapfel, C. Fügen, Matthias Denecke, A. Waibel
{"title":"Integrating emotional cues into a framework for dialogue management","authors":"H. Holzapfel, C. Fügen, Matthias Denecke, A. Waibel","doi":"10.1109/ICMI.2002.1166983","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166983","url":null,"abstract":"Emotions are very important in human-human communication but are usually ignored in human-computer interaction. Recent work focuses on recognition and generation of emotions as well as emotion driven behavior. Our work focuses on the use of emotions in dialogue systems that can be used with speech input or as well in multi-modal environments. We describe a framework for using emotional cues in a dialogue system and their informational characterization. We describe emotion models that can be integrated into the dialogue system and can be used in different domains and tasks. Our application of the dialogue system is planned to model multi-modal human-computer-interaction with a humanoid robotic system.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121439182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Multimodal dialogue systems for interactive TV applications 交互式电视应用的多模式对话系统
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166979
Aseel Ibrahim, P. Johansson
{"title":"Multimodal dialogue systems for interactive TV applications","authors":"Aseel Ibrahim, P. Johansson","doi":"10.1109/ICMI.2002.1166979","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166979","url":null,"abstract":"Many studies have shown the advantages of building multimodal systems, but not in the interactive TV application context. This paper reports on a qualitative study of a multimodal program guide for interactive TV. The system was designed by adding speech interaction to an existing TV program guide. Results indicate that spoken natural language input combined with visual output is preferable for TV applications. Furthermore, user feedback requires a clear distinction between the dialogue system's domain result and system status in the visual output. Consequently, we propose an interaction model that consists of three entities: user, domain results, and system feedback.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122961104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A video based interface to textual information for the visually impaired 为视障人士提供的基于视频的文本信息界面
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167016
Ali Zandifar, R. Duraiswami, Antoine Chahine, L. Davis
{"title":"A video based interface to textual information for the visually impaired","authors":"Ali Zandifar, R. Duraiswami, Antoine Chahine, L. Davis","doi":"10.1109/ICMI.2002.1167016","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167016","url":null,"abstract":"We describe the development of an interface to textual information for the visually impaired that uses video, image processing, optical-character-recognition (OCR) and text-to-speech (TTS). The video provides a sequence of low resolution images in which text must be detected, rectified and converted into high resolution rectangular blocks that are capable of being analyzed via off-the-shelf OCR. To achieve this, various problems related to feature detection, mosaicing, auto-focus, zoom, and systems integration were solved in the development of the system.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122568167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Multimodal interaction during multiparty dialogues: initial results 多方对话中的多模态交互:初步结果
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1167037
Philip R. Cohen, Rachel Coulston, Kelly Krout
{"title":"Multimodal interaction during multiparty dialogues: initial results","authors":"Philip R. Cohen, Rachel Coulston, Kelly Krout","doi":"10.1109/ICMI.2002.1167037","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1167037","url":null,"abstract":"Groups of people involved in collaboration on a task often incorporate the objects in their mutual environment into their discussion. With this comes physical reference to these 3-D objects, including: gesture, gaze, haptics, and possibly other modalities, over and above the speech we commonly associate with human-human communication. From a technological perspective, this human style of communication not only poses the challenge for researchers to create multimodal systems capable of integrating input from various modalities, but also to do it well enough that it supports, but does not interfere with the primary goal of the collaborators, which is their own human-human interaction. This paper offers a first step towards building such multimodal systems for supporting face-to-face collaborative work by providing both qualitative and quantitative analyses of multiparty multimodal dialogues in a field setting.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123932480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Integration of tone related feature for Chinese speech recognition 中文语音识别中声调相关特征的集成
Proceedings. Fourth IEEE International Conference on Multimodal Interfaces Pub Date : 2002-10-14 DOI: 10.1109/ICMI.2002.1166970
Pui-Fung Wong, M. Siu
{"title":"Integration of tone related feature for Chinese speech recognition","authors":"Pui-Fung Wong, M. Siu","doi":"10.1109/ICMI.2002.1166970","DOIUrl":"https://doi.org/10.1109/ICMI.2002.1166970","url":null,"abstract":"Chinese is a tonal language that uses fundamental frequency, in addition to phones for word differentiation. Commonly used front-end features, such as mel-frequency cepstral coefficients (MFCC), however, are optimized for non-tonal languages such as English and are not mainly focused on pitch information that is important for tone identification. In this paper, we examine the integration of tone-related acoustic features for Chinese recognition. We propose the use of the cepstrum method (CEP), which uses the same configurations as in MFCC extraction for the extraction of pitch-related features. The pitch periods extracted from the CEP algorithm can be used directly for speech recognition and do not require any special treatment for unvoiced frames. In addition, we explore a number of feature transformations and find that the addition of a properly normalized and transformed set of pitch related-features can reduce the recognition error rate from 34.61% to 29.45% on the Chinese 1998 National Performance Assessment (Project 863) corpus.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115931463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信