Dual-stage attention based symmetric framework for stereo video quality assessment

IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Kairui Zhang , Xiao Ke , Xin Chen
{"title":"Dual-stage attention based symmetric framework for stereo video quality assessment","authors":"Kairui Zhang ,&nbsp;Xiao Ke ,&nbsp;Xin Chen","doi":"10.1016/j.displa.2025.103232","DOIUrl":null,"url":null,"abstract":"<div><div>The compelling creative capabilities of stereo video have captured the attention of scholars towards its quality. Given the substantial challenge posed by asymmetric distortion in stereoscopic visual perception within the realm of stereoscopic video quality evaluation (SVQA), this study introduces the novel <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> (Dual Branch, dual-stage Attention, Dual Task) framework for stereoscopic video quality assessment. Leveraging its innovative dual-task architecture, <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> employs a dual-branch independent prediction mechanism for the left and right views. This approach not only effectively addresses the prevalent issue of asymmetric distortion in stereoscopic videos but also pinpoints which view drags the overall score down. To surmount the limitations of existing models in capturing global detail attention, <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> incorporates a two-stage distorted attention fusion module. This module enables multi-level fusion of video features at both block and pixel levels, bolstering the model’s attention towards global details and its processing capabilities, consequently enhancing the overall performance of the model. <span><math><mrow><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math></span> has exhibited exceptional performance across mainstream and cross-domain datasets, establishing itself as the current state-of-the-art (SOTA) technology.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103232"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002690","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

The compelling creative capabilities of stereo video have captured the attention of scholars towards its quality. Given the substantial challenge posed by asymmetric distortion in stereoscopic visual perception within the realm of stereoscopic video quality evaluation (SVQA), this study introduces the novel D3Net (Dual Branch, dual-stage Attention, Dual Task) framework for stereoscopic video quality assessment. Leveraging its innovative dual-task architecture, D3Net employs a dual-branch independent prediction mechanism for the left and right views. This approach not only effectively addresses the prevalent issue of asymmetric distortion in stereoscopic videos but also pinpoints which view drags the overall score down. To surmount the limitations of existing models in capturing global detail attention, D3Net incorporates a two-stage distorted attention fusion module. This module enables multi-level fusion of video features at both block and pixel levels, bolstering the model’s attention towards global details and its processing capabilities, consequently enhancing the overall performance of the model. D3Net has exhibited exceptional performance across mainstream and cross-domain datasets, establishing itself as the current state-of-the-art (SOTA) technology.
基于双阶段注意力的立体视频质量评价对称框架
立体影像引人注目的创作能力引起了学者对其质量的关注。鉴于立体视觉感知中的不对称失真在立体视频质量评估(SVQA)领域所带来的巨大挑战,本研究引入了新的D3Net(双分支、双阶段注意、双任务)框架来进行立体视频质量评估。利用其创新的双任务架构,D3Net为左视图和右视图采用了双分支独立的预测机制。这种方法不仅有效地解决了立体视频中普遍存在的不对称失真问题,而且还精确地指出了哪个视图会拖低整体得分。为了克服现有模型在捕获全局细节注意力方面的局限性,D3Net采用了两阶段扭曲注意力融合模块。该模块能够在块和像素级别多级融合视频特征,增强模型对全局细节及其处理能力的关注,从而增强模型的整体性能。D3Net在主流和跨域数据集上表现出卓越的性能,使其成为当前最先进的(SOTA)技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Displays
Displays 工程技术-工程:电子与电气
CiteScore
4.60
自引率
25.60%
发文量
138
审稿时长
92 days
期刊介绍: Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信