基于意图的多模态语音和手势融合在装配情境下的人机通信

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2022-12-01 DOI:10.1109/ICMLA55696.2022.00127

Sheuli Paul, Michael Sintek, Veton Këpuska, M. Silaghi, Liam Robertson

{"title":"基于意图的多模态语音和手势融合在装配情境下的人机通信","authors":"Sheuli Paul, Michael Sintek, Veton Këpuska, M. Silaghi, Liam Robertson","doi":"10.1109/ICMLA55696.2022.00127","DOIUrl":null,"url":null,"abstract":"Understanding the intent is an essential step for maintaining effective communications. This essential feature is used in communications for assembling, patrolling, and surveillance. A fused and interactive multimodal system for human-robot communication, used in assembly applications, is presented in this paper. Communication is multimodal. Having the options of multiple communication modes such as gestures, text, symbols, graphics, images, and speech increase the chance of effective communication. The intent is the main component that we are aiming to model, specifically in human machine dialogues. For this, we extract the intents from spoken dialogues and fuse the intent with any detected matching gesture that is used in interaction with the robot. The main components of the presented system are: (1) a speech recognizer system using Kaldi, (2) a deep-learning based Dual Intent and Entity Transformer (DIET) based classifier for intent and entity extraction, (3) a hand gesture recognition system, and (4) a dynamic fusion model for speech and gesture based communication. These are evaluated on contextual assembly situation using a simulated interactive robot.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Intent based Multimodal Speech and Gesture Fusion for Human-Robot Communication in Assembly Situation\",\"authors\":\"Sheuli Paul, Michael Sintek, Veton Këpuska, M. Silaghi, Liam Robertson\",\"doi\":\"10.1109/ICMLA55696.2022.00127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the intent is an essential step for maintaining effective communications. This essential feature is used in communications for assembling, patrolling, and surveillance. A fused and interactive multimodal system for human-robot communication, used in assembly applications, is presented in this paper. Communication is multimodal. Having the options of multiple communication modes such as gestures, text, symbols, graphics, images, and speech increase the chance of effective communication. The intent is the main component that we are aiming to model, specifically in human machine dialogues. For this, we extract the intents from spoken dialogues and fuse the intent with any detected matching gesture that is used in interaction with the robot. The main components of the presented system are: (1) a speech recognizer system using Kaldi, (2) a deep-learning based Dual Intent and Entity Transformer (DIET) based classifier for intent and entity extraction, (3) a hand gesture recognition system, and (4) a dynamic fusion model for speech and gesture based communication. These are evaluated on contextual assembly situation using a simulated interactive robot.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

理解意图是保持有效沟通的必要步骤。这一基本功能用于通信，用于组装，巡逻和监视。本文提出了一种用于装配应用的融合交互式多模态人机通信系统。交流是多模式的。拥有多种通信模式的选项，如手势、文本、符号、图形、图像和语音，增加了有效通信的机会。意图是我们要建模的主要组件，特别是在人机对话中。为此，我们从口语对话中提取意图，并将意图融合到与机器人交互中使用的任何检测到的匹配手势中。该系统的主要组成部分是:(1)使用Kaldi的语音识别系统;(2)基于深度学习的基于意图和实体提取的双重意图和实体转换(DIET)分类器;(3)手势识别系统;(4)基于语音和手势通信的动态融合模型。使用模拟的交互式机器人对上下文装配情况进行评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Intent based Multimodal Speech and Gesture Fusion for Human-Robot Communication in Assembly Situation

Understanding the intent is an essential step for maintaining effective communications. This essential feature is used in communications for assembling, patrolling, and surveillance. A fused and interactive multimodal system for human-robot communication, used in assembly applications, is presented in this paper. Communication is multimodal. Having the options of multiple communication modes such as gestures, text, symbols, graphics, images, and speech increase the chance of effective communication. The intent is the main component that we are aiming to model, specifically in human machine dialogues. For this, we extract the intents from spoken dialogues and fuse the intent with any detected matching gesture that is used in interaction with the robot. The main components of the presented system are: (1) a speech recognizer system using Kaldi, (2) a deep-learning based Dual Intent and Entity Transformer (DIET) based classifier for intent and entity extraction, (3) a hand gesture recognition system, and (4) a dynamic fusion model for speech and gesture based communication. These are evaluated on contextual assembly situation using a simulated interactive robot.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量