考虑视听内容一致性的多新闻视频摘要

Int. J. Semantic Comput. Pub Date : 2019-04-03 DOI:10.1142/S1793351X19500016

Ye Zhang, Ryunosuke Tanishige, I. Ide, Keisuke Doman, Yasutomo Kawanishi, Daisuke Deguchi, H. Murase

{"title":"考虑视听内容一致性的多新闻视频摘要","authors":"Ye Zhang, Ryunosuke Tanishige, I. Ide, Keisuke Doman, Yasutomo Kawanishi, Daisuke Deguchi, H. Murase","doi":"10.1142/S1793351X19500016","DOIUrl":null,"url":null,"abstract":"News videos are valuable multimedia information on real-world events. However, due to the incremental nature of the contents, a sequence of news videos on a related news topic could be redundant and lengthy. Thus, a number of methods have been proposed for their summarization. However, there is a problem that most of these methods do not consider the consistency between the auditory and visual contents. This becomes a problem in the case of news videos, since both contents do not always come from the same source. Considering this, in this paper, we propose a method for summarizing a sequence of news videos considering the consistency of auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and next selects a shot in the news story whose “Visual Concepts” detected from the visual contents are the most consistent with the selected key-sentence. In the end, the audio segment corresponding to each key-sentence is synthesized with the selected shot, and then these clips are concatenated into a summarized video. Results from subjective experiments on summarized videos on several news topics show the effectiveness of the proposed method.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Summarization of Multiple News Videos Considering the Consistency of Audio-Visual Contents\",\"authors\":\"Ye Zhang, Ryunosuke Tanishige, I. Ide, Keisuke Doman, Yasutomo Kawanishi, Daisuke Deguchi, H. Murase\",\"doi\":\"10.1142/S1793351X19500016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"News videos are valuable multimedia information on real-world events. However, due to the incremental nature of the contents, a sequence of news videos on a related news topic could be redundant and lengthy. Thus, a number of methods have been proposed for their summarization. However, there is a problem that most of these methods do not consider the consistency between the auditory and visual contents. This becomes a problem in the case of news videos, since both contents do not always come from the same source. Considering this, in this paper, we propose a method for summarizing a sequence of news videos considering the consistency of auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and next selects a shot in the news story whose “Visual Concepts” detected from the visual contents are the most consistent with the selected key-sentence. In the end, the audio segment corresponding to each key-sentence is synthesized with the selected shot, and then these clips are concatenated into a summarized video. Results from subjective experiments on summarized videos on several news topics show the effectiveness of the proposed method.\",\"PeriodicalId\":217956,\"journal\":{\"name\":\"Int. J. Semantic Comput.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Semantic Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S1793351X19500016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Semantic Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S1793351X19500016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

新闻视频是关于现实世界事件的有价值的多媒体信息。然而，由于内容的增量性，相关新闻主题的新闻视频序列可能是冗余和冗长的。因此，提出了一些方法来总结它们。然而，这些方法大多没有考虑到听觉和视觉内容之间的一致性。这在新闻视频中就成了一个问题，因为这两种内容并不总是来自同一来源。考虑到这一点，本文提出了一种考虑听觉和视觉内容一致性的新闻视频序列总结方法。该方法首先从序列中每个新闻故事的听觉内容(Closed Caption)中选择关键句，然后在新闻故事中选择从视觉内容中检测到的“视觉概念”与所选关键句最一致的镜头。最后，将每个关键句对应的音频片段与选定的镜头合成，然后将这些片段拼接成一个汇总视频。对几个新闻主题的视频摘要进行主观实验，结果表明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Summarization of Multiple News Videos Considering the Consistency of Audio-Visual Contents

News videos are valuable multimedia information on real-world events. However, due to the incremental nature of the contents, a sequence of news videos on a related news topic could be redundant and lengthy. Thus, a number of methods have been proposed for their summarization. However, there is a problem that most of these methods do not consider the consistency between the auditory and visual contents. This becomes a problem in the case of news videos, since both contents do not always come from the same source. Considering this, in this paper, we propose a method for summarizing a sequence of news videos considering the consistency of auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and next selects a shot in the news story whose “Visual Concepts” detected from the visual contents are the most consistent with the selected key-sentence. In the end, the audio segment corresponding to each key-sentence is synthesized with the selected shot, and then these clips are concatenated into a summarized video. Results from subjective experiments on summarized videos on several news topics show the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Semantic Comput.

自引率

0.00%

发文量