{"title":"A Hybrid Video-to-Text Summarization Framework and Algorithm on Cascading Advanced Extractive- and Abstractive-based Approaches for Supporting Viewers' Video Navigation and Understanding","authors":"Aishwarya Ramakrishnan, Chun-Kit Ngan","doi":"10.1109/AIKE55402.2022.00012","DOIUrl":null,"url":null,"abstract":"In this work, we propose the development of a hybrid video-to-text summarization (VTS) framework on cascading the advanced and code-accessible extractive and abstractive (EA) approaches for supporting viewers' video navigation and understanding. More precisely, the contributions of this paper are three-fold. First, we devise an automated and unified hybrid VTS framework that takes an arbitrary video as an input, generates the text transcripts from its human dialogues, and then summarizes the text transcripts into one short video synopsis. Second, we advance the binary merge-sort approach and expand its use to develop an intuitive and heuristic abstractive-based algorithm, with the time complexity $O(T_{L}logT_{L})$ and the space complexity $O(T_{L})$, where TL is the total number of word tokens on a text, to dynamically and successively split and merge a long piece of text transcripts, which exceeds the input text size limitation of an abstractive model, to generate one final semantic video synopsis. At the end, we test the feasibility of applying this proposed framework and algorithm in conducting the preliminarily experimental evaluations on three different videos, as a pilot study, in genres, contents, and lengths. We show that our approach outperforms and/or levels most of the individual EA methods stated above by 75% in terms of the ROUGE F1-Score measurement.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIKE55402.2022.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Hybrid Video-to-Text Summarization Framework and Algorithm on Cascading Advanced Extractive- and Abstractive-based Approaches for Supporting Viewers' Video Navigation and Understanding
In this work, we propose the development of a hybrid video-to-text summarization (VTS) framework on cascading the advanced and code-accessible extractive and abstractive (EA) approaches for supporting viewers' video navigation and understanding. More precisely, the contributions of this paper are three-fold. First, we devise an automated and unified hybrid VTS framework that takes an arbitrary video as an input, generates the text transcripts from its human dialogues, and then summarizes the text transcripts into one short video synopsis. Second, we advance the binary merge-sort approach and expand its use to develop an intuitive and heuristic abstractive-based algorithm, with the time complexity $O(T_{L}logT_{L})$ and the space complexity $O(T_{L})$, where TL is the total number of word tokens on a text, to dynamically and successively split and merge a long piece of text transcripts, which exceeds the input text size limitation of an abstractive model, to generate one final semantic video synopsis. At the end, we test the feasibility of applying this proposed framework and algorithm in conducting the preliminarily experimental evaluations on three different videos, as a pilot study, in genres, contents, and lengths. We show that our approach outperforms and/or levels most of the individual EA methods stated above by 75% in terms of the ROUGE F1-Score measurement.