Proceedings of the 1st International Workshop on Multimodal Conversational AI最新文献

筛选
英文 中文
Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations 基于自动语音识别和自然语言理解的多人对话情感检测
Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423737
Ilja Popovic, D. Culibrk, Milan Mirković, S. Vukmirović
{"title":"Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations","authors":"Ilja Popovic, D. Culibrk, Milan Mirković, S. Vukmirović","doi":"10.1145/3423325.3423737","DOIUrl":"https://doi.org/10.1145/3423325.3423737","url":null,"abstract":"Conversational emotion and sentiment analysis approaches rely on Natural Language Understanding (NLU) and audio processing components to achieve the goal of detecting emotions and sentiment based on what is being said. While there has been marked progress in pushing the state-of-the-art of theses methods on benchmark multimodal data sets, such as the Multimodal EmotionLines Dataset (MELD), the advances still seem to lag behind what has been achieved in the domain of mainstream Automatic Speech Recognition (ASR) and NLU applications and we were unable to identify any widely used products, services or production-ready systems that would enable the user to reliably detect emotions from audio recordings of multi-party conversations. Published, state-of-the-art scientific studies of multi-view emotion recognition seem to take it for granted that a human-generated or edited transcript is available as input to the NLU modules, providing no information of what happens in a realistic application scenario, where audio only is available and the NLU processing has to rely on text generated by ASR. Motivated by this insight, we present a study designed to evaluate the possibility of applying widely-used state-of-the-art commercial ASR products as the initial audio processing component in an emotion-from-speech detection system. We propose an approach which relies on commercially available products and services, such as Google Speech-to-Text, Mozilla DeepSpeech and the NVIDIA NeMo toolkit to process the audio and applies state-of-the-art NLU approaches for emotion recognition, in order to quickly create a robust, production-ready emotion-from-speech detection system applicable to multi-party conversations.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Augment Machine Intelligence with Multimodal Information 用多模态信息增强机器智能
Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3424123
Zhou Yu
{"title":"Augment Machine Intelligence with Multimodal Information","authors":"Zhou Yu","doi":"10.1145/3423325.3424123","DOIUrl":"https://doi.org/10.1145/3423325.3424123","url":null,"abstract":"Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124786813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FUN-Agent: A 2020 HUMAINE Competition Entrant FUN-Agent: 2020年人类竞赛的参赛者
Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423736
R. Geraghty, James Hale, S. Sen, Timothy S. Kroecker
{"title":"FUN-Agent: A 2020 HUMAINE Competition Entrant","authors":"R. Geraghty, James Hale, S. Sen, Timothy S. Kroecker","doi":"10.1145/3423325.3423736","DOIUrl":"https://doi.org/10.1145/3423325.3423736","url":null,"abstract":"Of late, there has been a significant surge of interest in industry and the general populace about future potential of human-AI collaboration [20]. Academic researchers have been pushing the frontier of new modalities of peer-level and ad-hoc human agent collaboration [10;22] for a longer period. We have been particularly interested in research on agents representing human users in negotiating deals with other human and autonomous agents [12;16;18]. Here we present the design for the conversational aspect of our agent entry into the HUMAINE League of the 2020 Automated Negotiation Agent Competition (ANAC). We discuss how our agent utilizes conversational and negotiation strategies, that mimic those used in human negotiations, to maximize its utility as a simulated street vendor. We leverage verbal influence tactics, offer pricing, and increasing human convenience to entice the buyer, build trust and discourage exploitation. Additionally, we discuss the results of some in-house testing we conducted.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123225051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Assisted Speech to Enable Second Language 辅助语音使第二语言成为可能
Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423735
Mehmet Altinkaya, A. Smeulders
{"title":"Assisted Speech to Enable Second Language","authors":"Mehmet Altinkaya, A. Smeulders","doi":"10.1145/3423325.3423735","DOIUrl":"https://doi.org/10.1145/3423325.3423735","url":null,"abstract":"Speaking a second language (L2) is a desired capability for billionsof people. Currently, the only way to achieve it naturally is througha lengthy and tedious training, which ends up various stages offluency. The process is far away from the natural acquisition of alanguage.In this paper, we propose a system that enables any person withsome basic understanding of L2 speak fluently through \"Instant As-sistance\" provided by digital conversational agents such as GoogleAssistant, Microsoft Cortana, or Apple Siri, which monitors thespeaker. It attends to provide assistance to continue to speak whenspeech is interrupted as it is not yet completely mastered. The notyet acquired elements of language can be missing words, unfa-miliarity with expressions, the implicit rules of articles, and thehabits of sayings. We can employ the hardware and software of theassistants to create an immersive, adaptive learning environmentto train the speaker online by a symbiotic interaction for implicit,unnoticeable correction.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125822880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Motivation and Design of the Conversational Components of DraftAgent for Human-Agent Negotiation 面向人代理协商的DraftAgent会话组件的动机与设计
Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423734
Dale Peasley, Michael Naguib, Bohan Xu, S. Sen, Timothy S. Kroecker
{"title":"Motivation and Design of the Conversational Components of DraftAgent for Human-Agent Negotiation","authors":"Dale Peasley, Michael Naguib, Bohan Xu, S. Sen, Timothy S. Kroecker","doi":"10.1145/3423325.3423734","DOIUrl":"https://doi.org/10.1145/3423325.3423734","url":null,"abstract":"In sync with the significant interest in industry and the general populace about future potential of human-AI collaboration [14], academic researchers have been pushing the frontier of new modalities of peer-level and ad-hoc human agent collaboration [4,15]. We have been particularly interested in research on agents representing human users in negotiating deals with other human and autonomous agents [6,11,13]. We present the design motivation and key components of the conversational aspect of our agent entry into the Human-Agent League(HAL) (http://web.tuat.ac.jp/~katfuji/ANAC2020/cfp/ham_cfp.pdf )of the 2020 Automated Negotiation Agent Competition (ANAC). We explore how language can be used to promote human-agent collaboration even in the domain of a competitive negotiation. We present small scale in-lab testing to demonstrate the potential of our approach.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127690656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Dynamic, Self Supervised, Large Scale AudioVisual Dataset for Stuttered Speech 一个动态的、自监督的、大规模的口吃语音视听数据集
Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423733
Mehmet Altinkaya, A. Smeulders
{"title":"A Dynamic, Self Supervised, Large Scale AudioVisual Dataset for Stuttered Speech","authors":"Mehmet Altinkaya, A. Smeulders","doi":"10.1145/3423325.3423733","DOIUrl":"https://doi.org/10.1145/3423325.3423733","url":null,"abstract":"Stuttering affects at least 1% of the world population. It is caused by irregular disruptions in speech production. These interruptions occur in various forms and frequencies. Repetition of words or parts of words, prolongations, or blocks in getting the words out are the most common ones. Accurate detection and classification of stuttering would be important in the assessment of severity for speech therapy. Furthermore, real time detection might create many new possibilities to facilitate reconstruction into fluent speech. Such an interface could help people to utilize voice-based interfaces like Apple Siri and Google Assistant, or to make (video) phone calls more fluent by delayed delivery. In this paper we present the first expandable audio-visual database of stuttered speech. We explore an end-to-end, real-time, multi-modal model for detection and classification of stuttered blocks in unbound speech. We also make use of video signals since acoustic signals cannot be produced immediately. We use multiple modalities as acoustic signals together with secondary characteristics exhibited in visual signals will permit an increased accuracy of detection.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131247702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Proceedings of the 1st International Workshop on Multimodal Conversational AI 第一届多模态会话人工智能国际研讨会论文集
{"title":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","authors":"","doi":"10.1145/3423325","DOIUrl":"https://doi.org/10.1145/3423325","url":null,"abstract":"","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124322834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信