带有事实强化和情感唤醒功能的视频情感描述

3区 计算机科学 Q1 Computer Science
Pengjie Tang, Hong Rao, Ai Zhang, Yunlan Tan
{"title":"带有事实强化和情感唤醒功能的视频情感描述","authors":"Pengjie Tang, Hong Rao, Ai Zhang, Yunlan Tan","doi":"10.1007/s12652-024-04779-x","DOIUrl":null,"url":null,"abstract":"<p>Video description aims to translate the visual content in a video with appropriate natural language. Most of current works only focus on the description of factual content, paying insufficient attention to the emotions in the video. And the sentences always lack flexibility and vividness. In this work, a fact enhancement and emotion awakening based model is proposed to describe the video, making the sentence more attractive and colorful. The strategy of deep incremental leaning is employed to build a multi-layer sequential network firstly, and multi-stage training method is used to sufficiently optimize the model. Secondly, the modules of fact inspiration, fact reinforcement and emotion awakening are constructed layer by layer to discovery more facts and embed emotions naturally. The three modules are cumulatively trained to sufficiently mine the factual and emotional information. Two public datasets including EmVidCap-S and EmVidCap are employed to evaluate the proposed model. The experimental results show that the performance of the proposed model is superior to not only the baseline models, but also the other popular methods.</p>","PeriodicalId":14959,"journal":{"name":"Journal of Ambient Intelligence and Humanized Computing","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video emotional description with fact reinforcement and emotion awaking\",\"authors\":\"Pengjie Tang, Hong Rao, Ai Zhang, Yunlan Tan\",\"doi\":\"10.1007/s12652-024-04779-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Video description aims to translate the visual content in a video with appropriate natural language. Most of current works only focus on the description of factual content, paying insufficient attention to the emotions in the video. And the sentences always lack flexibility and vividness. In this work, a fact enhancement and emotion awakening based model is proposed to describe the video, making the sentence more attractive and colorful. The strategy of deep incremental leaning is employed to build a multi-layer sequential network firstly, and multi-stage training method is used to sufficiently optimize the model. Secondly, the modules of fact inspiration, fact reinforcement and emotion awakening are constructed layer by layer to discovery more facts and embed emotions naturally. The three modules are cumulatively trained to sufficiently mine the factual and emotional information. Two public datasets including EmVidCap-S and EmVidCap are employed to evaluate the proposed model. The experimental results show that the performance of the proposed model is superior to not only the baseline models, but also the other popular methods.</p>\",\"PeriodicalId\":14959,\"journal\":{\"name\":\"Journal of Ambient Intelligence and Humanized Computing\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Ambient Intelligence and Humanized Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12652-024-04779-x\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Ambient Intelligence and Humanized Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12652-024-04779-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

摘要

视频描述旨在用适当的自然语言翻译视频中的视觉内容。目前的大多数作品只关注事实内容的描述,对视频中的情感关注不够。而且句子总是缺乏灵活性和生动性。在这项工作中,提出了一种基于事实增强和情感唤醒的模型来描述视频,使句子更有吸引力,更加丰富多彩。首先,采用深度增量精简策略构建多层顺序网络,并采用多阶段训练方法对模型进行充分优化。其次,逐层构建事实启发、事实强化和情感唤醒模块,以发现更多事实并自然地嵌入情感。通过对这三个模块的累积训练,可以充分挖掘事实和情感信息。两个公开数据集(包括 EmVidCap-S 和 EmVidCap)被用来评估所提出的模型。实验结果表明,所提模型的性能不仅优于基线模型,也优于其他流行方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Video emotional description with fact reinforcement and emotion awaking

Video emotional description with fact reinforcement and emotion awaking

Video description aims to translate the visual content in a video with appropriate natural language. Most of current works only focus on the description of factual content, paying insufficient attention to the emotions in the video. And the sentences always lack flexibility and vividness. In this work, a fact enhancement and emotion awakening based model is proposed to describe the video, making the sentence more attractive and colorful. The strategy of deep incremental leaning is employed to build a multi-layer sequential network firstly, and multi-stage training method is used to sufficiently optimize the model. Secondly, the modules of fact inspiration, fact reinforcement and emotion awakening are constructed layer by layer to discovery more facts and embed emotions naturally. The three modules are cumulatively trained to sufficiently mine the factual and emotional information. Two public datasets including EmVidCap-S and EmVidCap are employed to evaluate the proposed model. The experimental results show that the performance of the proposed model is superior to not only the baseline models, but also the other popular methods.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Ambient Intelligence and Humanized Computing
Journal of Ambient Intelligence and Humanized Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
9.60
自引率
0.00%
发文量
854
期刊介绍: The purpose of JAIHC is to provide a high profile, leading edge forum for academics, industrial professionals, educators and policy makers involved in the field to contribute, to disseminate the most innovative researches and developments of all aspects of ambient intelligence and humanized computing, such as intelligent/smart objects, environments/spaces, and systems. The journal discusses various technical, safety, personal, social, physical, political, artistic and economic issues. The research topics covered by the journal are (but not limited to): Pervasive/Ubiquitous Computing and Applications Cognitive wireless sensor network Embedded Systems and Software Mobile Computing and Wireless Communications Next Generation Multimedia Systems Security, Privacy and Trust Service and Semantic Computing Advanced Networking Architectures Dependable, Reliable and Autonomic Computing Embedded Smart Agents Context awareness, social sensing and inference Multi modal interaction design Ergonomics and product prototyping Intelligent and self-organizing transportation networks & services Healthcare Systems Virtual Humans & Virtual Worlds Wearables sensors and actuators
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信