在人机和人机混合设置中使用面部和音频特征的收件人检测:一个深度学习框架

IF 1.9 Q3 COMPUTER SCIENCE, CYBERNETICS

IEEE Systems Man and Cybernetics Magazine Pub Date : 2023-04-01 DOI:10.1109/MSMC.2022.3224843

Fiseha B. Tesema, J. Gu, Wei Song, Hong-Chuan Wu, Shiqiang Zhu, Zheyuan Lin, Min Huang, Wen Wang, R. Kumar

{"title":"在人机和人机混合设置中使用面部和音频特征的收件人检测:一个深度学习框架","authors":"Fiseha B. Tesema, J. Gu, Wei Song, Hong-Chuan Wu, Shiqiang Zhu, Zheyuan Lin, Min Huang, Wen Wang, R. Kumar","doi":"10.1109/MSMC.2022.3224843","DOIUrl":null,"url":null,"abstract":"Addressee detection (AD) enables robots to interact smoothly with a human by distinguishing whether it is being addressed. However, this has not been widely explored. The few studies that have explored this area focused on a human-to-human or human-to-robot conversation confined inside a meeting room using gaze and utterance. These works used statistical and rule-based approaches, which tend to depend on specific settings. Further, they did not fully leverage the available audio and visual information or the short-term and long-term segments, and they have not explored combining important conversation cues—the facial and audio features. In addition, no audiovisual spatiotemporal annotated dataset captured in mixed human-to-human and human-to-robot settings is available to support exploring the area using new approaches.","PeriodicalId":43649,"journal":{"name":"IEEE Systems Man and Cybernetics Magazine","volume":"13 1","pages":"25-38"},"PeriodicalIF":1.9000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressee Detection Using Facial and Audio Features in Mixed Human–Human and Human–Robot Settings: A Deep Learning Framework\",\"authors\":\"Fiseha B. Tesema, J. Gu, Wei Song, Hong-Chuan Wu, Shiqiang Zhu, Zheyuan Lin, Min Huang, Wen Wang, R. Kumar\",\"doi\":\"10.1109/MSMC.2022.3224843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Addressee detection (AD) enables robots to interact smoothly with a human by distinguishing whether it is being addressed. However, this has not been widely explored. The few studies that have explored this area focused on a human-to-human or human-to-robot conversation confined inside a meeting room using gaze and utterance. These works used statistical and rule-based approaches, which tend to depend on specific settings. Further, they did not fully leverage the available audio and visual information or the short-term and long-term segments, and they have not explored combining important conversation cues—the facial and audio features. In addition, no audiovisual spatiotemporal annotated dataset captured in mixed human-to-human and human-to-robot settings is available to support exploring the area using new approaches.\",\"PeriodicalId\":43649,\"journal\":{\"name\":\"IEEE Systems Man and Cybernetics Magazine\",\"volume\":\"13 1\",\"pages\":\"25-38\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Systems Man and Cybernetics Magazine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSMC.2022.3224843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Systems Man and Cybernetics Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSMC.2022.3224843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

摘要

收件人检测(AD)使机器人能够通过识别是否有人对其进行称呼而顺利地与人进行交互。然而，这并没有得到广泛的探索。探索这一领域的少数研究主要集中在会议室内的人与人或人与人之间的对话，使用凝视和话语。这些工作使用统计和基于规则的方法，这些方法往往依赖于特定的设置。此外，他们没有充分利用可用的音频和视觉信息或短期和长期的部分，他们没有探索结合重要的对话线索-面部和音频特征。此外，没有在混合人对人和人对机器人设置中捕获的视听时空注释数据集可用于支持使用新方法探索该地区。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Addressee Detection Using Facial and Audio Features in Mixed Human–Human and Human–Robot Settings: A Deep Learning Framework

Addressee detection (AD) enables robots to interact smoothly with a human by distinguishing whether it is being addressed. However, this has not been widely explored. The few studies that have explored this area focused on a human-to-human or human-to-robot conversation confined inside a meeting room using gaze and utterance. These works used statistical and rule-based approaches, which tend to depend on specific settings. Further, they did not fully leverage the available audio and visual information or the short-term and long-term segments, and they have not explored combining important conversation cues—the facial and audio features. In addition, no audiovisual spatiotemporal annotated dataset captured in mixed human-to-human and human-to-robot settings is available to support exploring the area using new approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Systems Man and Cybernetics Magazine COMPUTER SCIENCE, CYBERNETICS-

自引率

6.20%

发文量