{"title":"利用高层次的特点总结学术报告","authors":"Keith Curtis, G. Jones, N. Campbell","doi":"10.1145/3078971.3079028","DOIUrl":null,"url":null,"abstract":"We present a novel method for the generation of automatic video summaries of academic presentations. We base our investigation on a corpus of multimodal academic conference presentations combining transcripts with paralinguistic multimodal features. We first generate summaries based on keywords by using transcripts created using automatic speech recognition (ASR). Start and end times for each spoken phrase are identified from the ASR transcript, then a value for each phrase created. Spoken phrases are then augmented by incorporating scores for human annotation of paralinguistic features. These features measure audience engagement, comprehension and speaker emphasis. We evaluate the effectiveness of summaries generated for individual presentations, created using speech transcripts and paralinguistic multimodal features, by performing eye-tracking evaluation of participants as they watch summaries and full presentations, and by questionnaire of participants upon completion of eye-tracking studies. Summaries were also evaluated for effectiveness by performing comparisons with an enhanced digital video browser.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"245 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Utilising High-Level Features in Summarisation of Academic Presentations\",\"authors\":\"Keith Curtis, G. Jones, N. Campbell\",\"doi\":\"10.1145/3078971.3079028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel method for the generation of automatic video summaries of academic presentations. We base our investigation on a corpus of multimodal academic conference presentations combining transcripts with paralinguistic multimodal features. We first generate summaries based on keywords by using transcripts created using automatic speech recognition (ASR). Start and end times for each spoken phrase are identified from the ASR transcript, then a value for each phrase created. Spoken phrases are then augmented by incorporating scores for human annotation of paralinguistic features. These features measure audience engagement, comprehension and speaker emphasis. We evaluate the effectiveness of summaries generated for individual presentations, created using speech transcripts and paralinguistic multimodal features, by performing eye-tracking evaluation of participants as they watch summaries and full presentations, and by questionnaire of participants upon completion of eye-tracking studies. Summaries were also evaluated for effectiveness by performing comparisons with an enhanced digital video browser.\",\"PeriodicalId\":403556,\"journal\":{\"name\":\"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval\",\"volume\":\"245 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3078971.3079028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078971.3079028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Utilising High-Level Features in Summarisation of Academic Presentations
We present a novel method for the generation of automatic video summaries of academic presentations. We base our investigation on a corpus of multimodal academic conference presentations combining transcripts with paralinguistic multimodal features. We first generate summaries based on keywords by using transcripts created using automatic speech recognition (ASR). Start and end times for each spoken phrase are identified from the ASR transcript, then a value for each phrase created. Spoken phrases are then augmented by incorporating scores for human annotation of paralinguistic features. These features measure audience engagement, comprehension and speaker emphasis. We evaluate the effectiveness of summaries generated for individual presentations, created using speech transcripts and paralinguistic multimodal features, by performing eye-tracking evaluation of participants as they watch summaries and full presentations, and by questionnaire of participants upon completion of eye-tracking studies. Summaries were also evaluated for effectiveness by performing comparisons with an enhanced digital video browser.