Effects of Good Speaking Techniques on Audience Engagement

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI:10.1145/2818346.2820766

Keith Curtis, G. Jones, N. Campbell

{"title":"Effects of Good Speaking Techniques on Audience Engagement","authors":"Keith Curtis, G. Jones, N. Campbell","doi":"10.1145/2818346.2820766","DOIUrl":null,"url":null,"abstract":"Understanding audience engagement levels for presentations has the potential to enable richer and more focused interaction with audio-visual recordings. We describe an investigation into automated analysis of multimodal recordings of scientific talks where the use of modalities most typically associated with engagement such as eye-gaze is not feasible. We first study visual and acoustic features to identify those most commonly associated with good speaking techniques. To understand audience interpretation of good speaking techniques, we angaged human annotators to rate the qualities of the speaker for a series of 30-second video segments taken from a corpus of 9 hours of presentations from an academic conference. Our annotators also watched corresponding video recordings of the audience to presentations to estimate the level of audience engagement for each talk. We then explored the effectiveness of multimodal features extracted from the presentation video against Likert-scale ratings of each speaker as assigned by the annotators. and on manually labelled audience engagement levels. These features were used to build a classifier to rate the qualities of a new speaker. This was able classify a rating for a presenter over an 8-class range with an accuracy of 52%. By combining these classes to a 4-class range accuracy increases to 73%. We analyse linear correlations with individual speaker-based modalities and actual audience engagement levels to understand the corresponding effect on audience engagement. A further classifier was then built to predict the level of audience engagement to a presentation by analysing the speaker's use of acoustic and visual cues. Using these speaker based modalities pre-fused with speaker ratings only, we are able to predict actual audience engagement levels with an accuracy of 68%. By combining with basic visual features from the audience as whole, we are able to improve this to an accuracy of 70%.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"112 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2818346.2820766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Understanding audience engagement levels for presentations has the potential to enable richer and more focused interaction with audio-visual recordings. We describe an investigation into automated analysis of multimodal recordings of scientific talks where the use of modalities most typically associated with engagement such as eye-gaze is not feasible. We first study visual and acoustic features to identify those most commonly associated with good speaking techniques. To understand audience interpretation of good speaking techniques, we angaged human annotators to rate the qualities of the speaker for a series of 30-second video segments taken from a corpus of 9 hours of presentations from an academic conference. Our annotators also watched corresponding video recordings of the audience to presentations to estimate the level of audience engagement for each talk. We then explored the effectiveness of multimodal features extracted from the presentation video against Likert-scale ratings of each speaker as assigned by the annotators. and on manually labelled audience engagement levels. These features were used to build a classifier to rate the qualities of a new speaker. This was able classify a rating for a presenter over an 8-class range with an accuracy of 52%. By combining these classes to a 4-class range accuracy increases to 73%. We analyse linear correlations with individual speaker-based modalities and actual audience engagement levels to understand the corresponding effect on audience engagement. A further classifier was then built to predict the level of audience engagement to a presentation by analysing the speaker's use of acoustic and visual cues. Using these speaker based modalities pre-fused with speaker ratings only, we are able to predict actual audience engagement levels with an accuracy of 68%. By combining with basic visual features from the audience as whole, we are able to improve this to an accuracy of 70%.

查看原文本刊更多论文

良好的演讲技巧对听众参与的影响

了解听众对演讲的参与程度，有可能实现更丰富、更集中的视听录音互动。我们描述了一项对科学谈话的多模态录音自动分析的调查，其中使用最典型的与参与相关的模态(如眼睛注视)是不可行的。我们首先研究视觉和听觉特征，以确定那些与良好的说话技巧最相关的特征。为了了解观众对良好演讲技巧的理解，我们聘请了人类注释员，对演讲者的质量进行评分，这些视频片段取自一次学术会议上9小时的演讲语料库，时长30秒。我们的注释员还观看了听众对演讲的相应视频记录，以估计听众对每次演讲的参与程度。然后，我们探索了从演示视频中提取的多模态特征与注释者分配的每个演讲者的李克特量表评分的有效性。以及人工标记的观众参与水平。这些特征被用来建立一个分类器来评价一个新说话者的品质。它能够在8级范围内对演讲者进行分类，准确率为52%。通过将这些等级组合为4级，射程精度提高到73%。我们分析了基于个人演讲者的模式和实际听众参与水平之间的线性相关性，以了解对听众参与的相应影响。然后建立了一个进一步的分类器，通过分析演讲者使用的声音和视觉线索来预测听众对演讲的参与程度。使用这些基于演讲者的模式，仅与演讲者评级预融合，我们能够以68%的准确率预测实际听众参与水平。通过结合观众整体的基本视觉特征，我们能够将准确率提高到70%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

自引率

0.00%

发文量