{"title":"Put That There: 20 Years of Research on Multimodal Interaction","authors":"J. Crowley","doi":"10.1145/3242969.3276309","DOIUrl":null,"url":null,"abstract":"Humans interact with the world using five major senses: sight, hearing, touch, smell, and taste. Almost all interaction with the environment is naturally multimodal, as audio, tactile or paralinguistic cues provide confirmation for physical actions and spoken language interaction. Multimodal interaction seeks to fully exploit these parallel channels for perception and action to provide robust, natural interaction. Richard Bolt's \"Put That There\" (1980) provided an early paradigm that demonstrated the power of multimodality and helped attract researchers from a variety of disciplines to study a new approach for post-WIMP computing that moves beyond desktop graphical user interfaces (GUI). In this talk, I will look back to the origins of the scientific community of multimodal interaction, and review some of the more salient results that have emerged over the last 20 years, including results in machine perception, system architectures, visualization, and computer to human communications. Recently, a number of game-changing technologies such as deep learning, cloud computing, and planetary scale data collection have emerged to provide robust solutions to historically hard problems. As a result, scientific understanding of multimodal interaction has taken on new relevance as construction of practical systems has become feasible. I will discuss the impact of these new technologies and the opportunities and challenges that they raise. I will conclude with a discussion of the importance of convergence with cognitive science and cognitive systems to provide foundations for intelligent, human-centered interactive systems that learn and fully understand humans and human-to-human social interaction, in order to provide services that surpass the abilities of the most intelligent human servants.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3276309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Humans interact with the world using five major senses: sight, hearing, touch, smell, and taste. Almost all interaction with the environment is naturally multimodal, as audio, tactile or paralinguistic cues provide confirmation for physical actions and spoken language interaction. Multimodal interaction seeks to fully exploit these parallel channels for perception and action to provide robust, natural interaction. Richard Bolt's "Put That There" (1980) provided an early paradigm that demonstrated the power of multimodality and helped attract researchers from a variety of disciplines to study a new approach for post-WIMP computing that moves beyond desktop graphical user interfaces (GUI). In this talk, I will look back to the origins of the scientific community of multimodal interaction, and review some of the more salient results that have emerged over the last 20 years, including results in machine perception, system architectures, visualization, and computer to human communications. Recently, a number of game-changing technologies such as deep learning, cloud computing, and planetary scale data collection have emerged to provide robust solutions to historically hard problems. As a result, scientific understanding of multimodal interaction has taken on new relevance as construction of practical systems has become feasible. I will discuss the impact of these new technologies and the opportunities and challenges that they raise. I will conclude with a discussion of the importance of convergence with cognitive science and cognitive systems to provide foundations for intelligent, human-centered interactive systems that learn and fully understand humans and human-to-human social interaction, in order to provide services that surpass the abilities of the most intelligent human servants.
人类通过五种主要感官与世界互动:视觉、听觉、触觉、嗅觉和味觉。几乎所有与环境的互动都是自然的多模态,因为声音、触觉或副语言线索为身体动作和口语互动提供了确认。多模态交互寻求充分利用这些感知和行动的平行通道,以提供稳健、自然的交互。Richard Bolt的“Put That There”(1980)提供了一个早期的范例,展示了多模态的力量,并帮助吸引了来自不同学科的研究人员来研究一种超越桌面图形用户界面(GUI)的后wimp计算新方法。在这次演讲中,我将回顾多模态交互科学界的起源,并回顾过去20年来出现的一些更突出的结果,包括机器感知、系统架构、可视化和计算机与人类通信方面的结果。最近,一些改变游戏规则的技术,如深度学习、云计算和行星规模的数据收集已经出现,为历史上的难题提供了强大的解决方案。因此,随着实际系统的构建变得可行,对多模态相互作用的科学理解具有了新的意义。我将讨论这些新技术的影响以及它们带来的机遇和挑战。最后,我将讨论与认知科学和认知系统融合的重要性,为智能、以人为中心的互动系统提供基础,这些系统可以学习并充分理解人类和人与人之间的社会互动,从而提供超越最聪明的人类仆人能力的服务。