CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings

Mi, Yachun, Li, Yu, Shu, Yan, Hui, Chen, Zhou, Puchao, Liu, Shaohui
{"title":"CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level\n Semantic Information related to Human Feelings","authors":"Mi, Yachun, Li, Yu, Shu, Yan, Hui, Chen, Zhou, Puchao, Liu, Shaohui","doi":"10.48550/arxiv.2311.07090","DOIUrl":null,"url":null,"abstract":"Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose CLiF-VQA, which considers both features related to human feelings and spatial features of videos. In order to effectively extract features related to human feelings from videos, we explore the consistency between CLIP and human feelings in video perception for the first time. Specifically, we design multiple objective and subjective descriptions closely related to human feelings as prompts. Further we propose a novel CLIP-based semantic feature extractor (SFE) which extracts features related to human feelings by sliding over multiple regions of the video frame. In addition, we further capture the low-level-aware features of the video through a spatial feature extraction module. The two different features are then aggregated thereby obtaining the quality score of the video. Extensive experiments show that the proposed CLiF-VQA exhibits excellent performance on several VQA datasets.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv (Cornell University)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arxiv.2311.07090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose CLiF-VQA, which considers both features related to human feelings and spatial features of videos. In order to effectively extract features related to human feelings from videos, we explore the consistency between CLIP and human feelings in video perception for the first time. Specifically, we design multiple objective and subjective descriptions closely related to human feelings as prompts. Further we propose a novel CLIP-based semantic feature extractor (SFE) which extracts features related to human feelings by sliding over multiple regions of the video frame. In addition, we further capture the low-level-aware features of the video through a spatial feature extraction module. The two different features are then aggregated thereby obtaining the quality score of the video. Extensive experiments show that the proposed CLiF-VQA exhibits excellent performance on several VQA datasets.
cliff - vqa:通过结合与人类情感相关的高级语义信息来增强视频质量评估
视频质量评估(VQA)旨在模拟人类视觉系统(HVS)感知视频质量的过程。HVS的判断总是受到人的主观感受的影响。然而,目前的VQA研究大多侧重于捕捉视频空间和时间域的各种扭曲,而忽略了人类情感的影响。在本文中,我们提出了cliff - vqa,它同时考虑了与人类情感相关的特征和视频的空间特征。为了有效地从视频中提取与人类情感相关的特征,我们首次探索了CLIP与人类情感在视频感知中的一致性。具体来说,我们设计了多个与人类情感密切相关的客观和主观描述作为提示。进一步,我们提出了一种新的基于clip的语义特征提取器(SFE),它通过在视频帧的多个区域上滑动来提取与人类情感相关的特征。此外,我们通过空间特征提取模块进一步捕获视频的低级感知特征。然后将这两种不同的特征聚合,从而获得视频的质量分数。大量的实验表明,所提出的CLiF-VQA在多个VQA数据集上表现出优异的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信