Semantic Attribute Enriched Storytelling from a Sequence of Images

Zainy M. Malakan, G. Hassan, M. Jalwana, Nayyer Aafaq, A. Mian
{"title":"Semantic Attribute Enriched Storytelling from a Sequence of Images","authors":"Zainy M. Malakan, G. Hassan, M. Jalwana, Nayyer Aafaq, A. Mian","doi":"10.1109/DICTA52665.2021.9647213","DOIUrl":null,"url":null,"abstract":"Visual storytelling (VST) pertains to the task of generating story-based sentences from an ordered sequence of images. Contemporary techniques suffer from several limitations such as inadequate encapsulation of visual variance and context capturing among the input sequence. Consequently, generated story from such techniques often lacks coherence, context and semantic information. In this research, we devise a ‘Semantic Attribute Enriched Storytelling’ (SAES) framework to mitigate these issues. To that end, we first extract the visual features of input image sequence and the noun entities present in the visual input by employing an off-the-shelf object detector. The two features are concatenated to encapsulate the visual variance of the input sequence. The features are then passed through a Bidirectional-LSTM sequence encoder to capture the past and future context of the input image sequence followed by attention mechanism to enhance the discriminality of the input to language model i.e., mogrifier-LSTM. Additionally, we incorporate semantic attributes e.g., nouns to complement the semantic context in the generated story. Detailed experimental and human evaluations are performed to establish competitive performance of proposed technique. We achieve up 1.4% improvement on BLEU metric over the recent state-of-art methods.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA52665.2021.9647213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Visual storytelling (VST) pertains to the task of generating story-based sentences from an ordered sequence of images. Contemporary techniques suffer from several limitations such as inadequate encapsulation of visual variance and context capturing among the input sequence. Consequently, generated story from such techniques often lacks coherence, context and semantic information. In this research, we devise a ‘Semantic Attribute Enriched Storytelling’ (SAES) framework to mitigate these issues. To that end, we first extract the visual features of input image sequence and the noun entities present in the visual input by employing an off-the-shelf object detector. The two features are concatenated to encapsulate the visual variance of the input sequence. The features are then passed through a Bidirectional-LSTM sequence encoder to capture the past and future context of the input image sequence followed by attention mechanism to enhance the discriminality of the input to language model i.e., mogrifier-LSTM. Additionally, we incorporate semantic attributes e.g., nouns to complement the semantic context in the generated story. Detailed experimental and human evaluations are performed to establish competitive performance of proposed technique. We achieve up 1.4% improvement on BLEU metric over the recent state-of-art methods.
语义属性丰富的图像序列叙事
视觉叙事(VST)涉及到从有序的图像序列中生成基于故事的句子的任务。当前的技术存在一些局限性,如对视觉差异的封装不足和输入序列之间的上下文捕获。因此,通过这种技术生成的故事往往缺乏连贯性、语境和语义信息。在这项研究中,我们设计了一个“语义属性丰富的故事叙述”(SAES)框架来缓解这些问题。为此,我们首先通过使用现成的对象检测器提取输入图像序列的视觉特征和视觉输入中存在的名词实体。将这两个特征连接起来以封装输入序列的视觉变化。然后,这些特征通过双向lstm序列编码器来捕捉输入图像序列的过去和未来上下文,然后通过注意机制来增强输入语言模型的区别性,即mogrifier-LSTM。此外,我们还结合了语义属性,例如名词,以补充生成故事中的语义上下文。进行了详细的实验和人体评估,以建立所提出的技术的竞争性能。与最近的最先进的方法相比,我们在BLEU指标上提高了1.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信