利用高层次的特点总结学术报告

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI:10.1145/3078971.3079028

Keith Curtis, G. Jones, N. Campbell

{"title":"利用高层次的特点总结学术报告","authors":"Keith Curtis, G. Jones, N. Campbell","doi":"10.1145/3078971.3079028","DOIUrl":null,"url":null,"abstract":"We present a novel method for the generation of automatic video summaries of academic presentations. We base our investigation on a corpus of multimodal academic conference presentations combining transcripts with paralinguistic multimodal features. We first generate summaries based on keywords by using transcripts created using automatic speech recognition (ASR). Start and end times for each spoken phrase are identified from the ASR transcript, then a value for each phrase created. Spoken phrases are then augmented by incorporating scores for human annotation of paralinguistic features. These features measure audience engagement, comprehension and speaker emphasis. We evaluate the effectiveness of summaries generated for individual presentations, created using speech transcripts and paralinguistic multimodal features, by performing eye-tracking evaluation of participants as they watch summaries and full presentations, and by questionnaire of participants upon completion of eye-tracking studies. Summaries were also evaluated for effectiveness by performing comparisons with an enhanced digital video browser.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"245 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Utilising High-Level Features in Summarisation of Academic Presentations\",\"authors\":\"Keith Curtis, G. Jones, N. Campbell\",\"doi\":\"10.1145/3078971.3079028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel method for the generation of automatic video summaries of academic presentations. We base our investigation on a corpus of multimodal academic conference presentations combining transcripts with paralinguistic multimodal features. We first generate summaries based on keywords by using transcripts created using automatic speech recognition (ASR). Start and end times for each spoken phrase are identified from the ASR transcript, then a value for each phrase created. Spoken phrases are then augmented by incorporating scores for human annotation of paralinguistic features. These features measure audience engagement, comprehension and speaker emphasis. We evaluate the effectiveness of summaries generated for individual presentations, created using speech transcripts and paralinguistic multimodal features, by performing eye-tracking evaluation of participants as they watch summaries and full presentations, and by questionnaire of participants upon completion of eye-tracking studies. Summaries were also evaluated for effectiveness by performing comparisons with an enhanced digital video browser.\",\"PeriodicalId\":403556,\"journal\":{\"name\":\"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval\",\"volume\":\"245 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3078971.3079028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078971.3079028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

我们提出了一种自动生成学术演讲视频摘要的新方法。我们的研究基于多模态学术会议演讲的语料库，结合了副语言的多模态特征。我们首先通过使用自动语音识别(ASR)创建的转录本生成基于关键字的摘要。每个口语短语的开始和结束时间从ASR记录中确定，然后为每个短语创建一个值。然后通过结合人类对副语言特征的注释得分来增强口语短语。这些特征衡量了听众的参与度、理解力和演讲者的强调程度。我们通过在参与者观看摘要和完整演示时对参与者进行眼动追踪评估，以及在参与者完成眼动追踪研究后对参与者进行问卷调查，来评估为个人演示生成的摘要的有效性，这些摘要使用语音抄本和副语言多模态特征创建。通过与增强型数字视频浏览器进行比较，还评估了摘要的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Utilising High-Level Features in Summarisation of Academic Presentations

We present a novel method for the generation of automatic video summaries of academic presentations. We base our investigation on a corpus of multimodal academic conference presentations combining transcripts with paralinguistic multimodal features. We first generate summaries based on keywords by using transcripts created using automatic speech recognition (ASR). Start and end times for each spoken phrase are identified from the ASR transcript, then a value for each phrase created. Spoken phrases are then augmented by incorporating scores for human annotation of paralinguistic features. These features measure audience engagement, comprehension and speaker emphasis. We evaluate the effectiveness of summaries generated for individual presentations, created using speech transcripts and paralinguistic multimodal features, by performing eye-tracking evaluation of participants as they watch summaries and full presentations, and by questionnaire of participants upon completion of eye-tracking studies. Summaries were also evaluated for effectiveness by performing comparisons with an enhanced digital video browser.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量