Utilizing multimodal cues to automatically evaluate public speaking performance

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI:10.1109/ACII.2015.7344601

L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran

{"title":"Utilizing multimodal cues to automatically evaluate public speaking performance","authors":"L. Chen, C. W. Leong, G. Feng, Chong Min Lee, Swapna Somasundaran","doi":"10.1109/ACII.2015.7344601","DOIUrl":null,"url":null,"abstract":"Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"110 1","pages":"394-400"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACII.2015.7344601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Public speaking, an important type of oral communication, is critical to success in both learning and career development. However, there is a lack of tools to efficiently and economically evaluate presenters' verbal and nonverbal behaviors. The recent advancements in automated scoring and multimodal sensing technologies may address this issue. We report a study on the development of an automated scoring model for public speaking performance using multimodal cues. A multimodal presentation corpus containing 14 subjects' 56 presentations has been recorded using a Microsoft Kinect depth camera. Task design, rubric development, and human rating were conducted according to standards in educational assessment. A rich set of multimodal features has been extracted from head poses, eye gazes, facial expressions, motion traces, speech signal, and transcripts. The model building experiment shows that jointly using both lexical/speech and visual features achieves more accurate scoring, which suggests the feasibility of using multimodal technologies in the assessment of public speaking skills.

查看原文本刊更多论文

利用多模态线索自动评估公众演讲表现

公共演讲是一种重要的口头交流方式，对学习和职业发展的成功都至关重要。然而，缺乏有效和经济地评估演讲者的语言和非语言行为的工具。自动化评分和多模态传感技术的最新进展可能会解决这个问题。我们报告了一项关于使用多模态线索开发公共演讲表演自动评分模型的研究。使用微软Kinect深度相机记录了包含14名受试者的56次演示的多模态演示语料库。按照教育评价标准进行任务设计、题型制定和人的评分。从头部姿势、眼神、面部表情、运动轨迹、语音信号和文本中提取了丰富的多模态特征。模型构建实验表明，同时使用词汇/语音和视觉特征可以获得更准确的评分，这表明在公共演讲技能评估中使用多模态技术是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

自引率

0.00%

发文量