Proceedings of the 1st International Workshop on Multimodal Conversational AI最新文献

Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations 基于自动语音识别和自然语言理解的多人对话情感检测

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423737

Ilja Popovic, D. Culibrk, Milan Mirković, S. Vukmirović

{"title":"Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations","authors":"Ilja Popovic, D. Culibrk, Milan Mirković, S. Vukmirović","doi":"10.1145/3423325.3423737","DOIUrl":"https://doi.org/10.1145/3423325.3423737","url":null,"abstract":"Conversational emotion and sentiment analysis approaches rely on Natural Language Understanding (NLU) and audio processing components to achieve the goal of detecting emotions and sentiment based on what is being said. While there has been marked progress in pushing the state-of-the-art of theses methods on benchmark multimodal data sets, such as the Multimodal EmotionLines Dataset (MELD), the advances still seem to lag behind what has been achieved in the domain of mainstream Automatic Speech Recognition (ASR) and NLU applications and we were unable to identify any widely used products, services or production-ready systems that would enable the user to reliably detect emotions from audio recordings of multi-party conversations. Published, state-of-the-art scientific studies of multi-view emotion recognition seem to take it for granted that a human-generated or edited transcript is available as input to the NLU modules, providing no information of what happens in a realistic application scenario, where audio only is available and the NLU processing has to rely on text generated by ASR. Motivated by this insight, we present a study designed to evaluate the possibility of applying widely-used state-of-the-art commercial ASR products as the initial audio processing component in an emotion-from-speech detection system. We propose an approach which relies on commercially available products and services, such as Google Speech-to-Text, Mozilla DeepSpeech and the NVIDIA NeMo toolkit to process the audio and applies state-of-the-art NLU approaches for emotion recognition, in order to quickly create a robust, production-ready emotion-from-speech detection system applicable to multi-party conversations.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Augment Machine Intelligence with Multimodal Information 用多模态信息增强机器智能

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3424123

Zhou Yu

引用次数: 1

FUN-Agent: A 2020 HUMAINE Competition Entrant FUN-Agent: 2020年人类竞赛的参赛者

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423736

R. Geraghty, James Hale, S. Sen, Timothy S. Kroecker

引用次数: 1

Motivation and Design of the Conversational Components of DraftAgent for Human-Agent Negotiation 面向人代理协商的DraftAgent会话组件的动机与设计

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423734

Dale Peasley, Michael Naguib, Bohan Xu, S. Sen, Timothy S. Kroecker

引用次数: 2

Assisted Speech to Enable Second Language 辅助语音使第二语言成为可能

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423735

Mehmet Altinkaya, A. Smeulders

引用次数: 1

A Dynamic, Self Supervised, Large Scale AudioVisual Dataset for Stuttered Speech 一个动态的、自监督的、大规模的口吃语音视听数据集

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 2020-10-16 DOI: 10.1145/3423325.3423733

Mehmet Altinkaya, A. Smeulders

{"title":"A Dynamic, Self Supervised, Large Scale AudioVisual Dataset for Stuttered Speech","authors":"Mehmet Altinkaya, A. Smeulders","doi":"10.1145/3423325.3423733","DOIUrl":"https://doi.org/10.1145/3423325.3423733","url":null,"abstract":"Stuttering affects at least 1% of the world population. It is caused by irregular disruptions in speech production. These interruptions occur in various forms and frequencies. Repetition of words or parts of words, prolongations, or blocks in getting the words out are the most common ones. Accurate detection and classification of stuttering would be important in the assessment of severity for speech therapy. Furthermore, real time detection might create many new possibilities to facilitate reconstruction into fluent speech. Such an interface could help people to utilize voice-based interfaces like Apple Siri and Google Assistant, or to make (video) phone calls more fluent by delayed delivery. In this paper we present the first expandable audio-visual database of stuttered speech. We explore an end-to-end, real-time, multi-modal model for detection and classification of stuttered blocks in unbound speech. We also make use of video signals since acoustic signals cannot be produced immediately. We use multiple modalities as acoustic signals together with secondary characteristics exhibited in visual signals will permit an increased accuracy of detection.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131247702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Proceedings of the 1st International Workshop on Multimodal Conversational AI 第一届多模态会话人工智能国际研讨会论文集

Proceedings of the 1st International Workshop on Multimodal Conversational AI Pub Date : 1900-01-01 DOI: 10.1145/3423325

引用次数: 0