Augment Machine Intelligence with Multimodal Information

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI:10.1145/3423325.3424123

Zhou Yu

{"title":"Augment Machine Intelligence with Multimodal Information","authors":"Zhou Yu","doi":"10.1145/3423325.3424123","DOIUrl":null,"url":null,"abstract":"Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3423325.3424123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.

查看原文本刊更多论文

用多模态信息增强机器智能

人类通过视觉、听觉、语言、触觉等各种渠道的信息与他人或世界互动。为了模拟智能，机器需要类似的能力来处理和组合来自不同渠道的信息，以获得更好的态势感知、更好的沟通能力和更好的决策能力。在这次演讲中，我们将介绍三个项目。在第一项研究中，我们使机器人能够同时利用视觉和音频信息来实现更好的用户理解[1]。然后，我们使用增量语言生成来改善机器人与人的沟通。在第二项研究中，我们利用多模态历史跟踪来优化面向任务的可视化对话中的策略规划。在第三个项目中，我们解决了在视觉对话生成中对话响应相关性和策略有效性之间众所周知的权衡。我们提出了一种新的机器学习过程，从监督学习和强化学习到视觉对话中的最佳语言生成和策略规划[2]。我们还将介绍一些最近正在进行的通过对话框进行图像合成的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st International Workshop on Multimodal Conversational AI

自引率

0.00%

发文量