用多模态信息增强机器智能

Zhou Yu
{"title":"用多模态信息增强机器智能","authors":"Zhou Yu","doi":"10.1145/3423325.3424123","DOIUrl":null,"url":null,"abstract":"Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Augment Machine Intelligence with Multimodal Information\",\"authors\":\"Zhou Yu\",\"doi\":\"10.1145/3423325.3424123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.\",\"PeriodicalId\":142947,\"journal\":{\"name\":\"Proceedings of the 1st International Workshop on Multimodal Conversational AI\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st International Workshop on Multimodal Conversational AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3423325.3424123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3423325.3424123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

人类通过视觉、听觉、语言、触觉等各种渠道的信息与他人或世界互动。为了模拟智能,机器需要类似的能力来处理和组合来自不同渠道的信息,以获得更好的态势感知、更好的沟通能力和更好的决策能力。在这次演讲中,我们将介绍三个项目。在第一项研究中,我们使机器人能够同时利用视觉和音频信息来实现更好的用户理解[1]。然后,我们使用增量语言生成来改善机器人与人的沟通。在第二项研究中,我们利用多模态历史跟踪来优化面向任务的可视化对话中的策略规划。在第三个项目中,我们解决了在视觉对话生成中对话响应相关性和策略有效性之间众所周知的权衡。我们提出了一种新的机器学习过程,从监督学习和强化学习到视觉对话中的最佳语言生成和策略规划[2]。我们还将介绍一些最近正在进行的通过对话框进行图像合成的工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Augment Machine Intelligence with Multimodal Information
Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信