DCVQE: A Hierarchical Transformer for Video Quality Assessment

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision Pub Date : 2022-10-10 DOI:10.48550/arXiv.2210.04377

Zu-Hua Li, Lei Yang

{"title":"DCVQE: A Hierarchical Transformer for Video Quality Assessment","authors":"Zu-Hua Li, Lei Yang","doi":"10.48550/arXiv.2210.04377","DOIUrl":null,"url":null,"abstract":"The explosion of user-generated videos stimulates a great demand for no-reference video quality assessment (NR-VQA). Inspired by our observation on the actions of human annotation, we put forward a Divide and Conquer Video Quality Estimator (DCVQE) for NR-VQA. Starting from extracting the frame-level quality embeddings (QE), our proposal splits the whole sequence into a number of clips and applies Transformers to learn the clip-level QE and update the frame-level QE simultaneously; another Transformer is introduced to combine the clip-level QE to generate the video-level QE. We call this hierarchical combination of Transformers as a Divide and Conquer Transformer (DCTr) layer. An accurate video quality feature extraction can be achieved by repeating the process of this DCTr layer several times. Taking the order relationship among the annotated data into account, we also propose a novel correlation loss term for model training. Experiments on various datasets confirm the effectiveness and robustness of our DCVQE model.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.04377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The explosion of user-generated videos stimulates a great demand for no-reference video quality assessment (NR-VQA). Inspired by our observation on the actions of human annotation, we put forward a Divide and Conquer Video Quality Estimator (DCVQE) for NR-VQA. Starting from extracting the frame-level quality embeddings (QE), our proposal splits the whole sequence into a number of clips and applies Transformers to learn the clip-level QE and update the frame-level QE simultaneously; another Transformer is introduced to combine the clip-level QE to generate the video-level QE. We call this hierarchical combination of Transformers as a Divide and Conquer Transformer (DCTr) layer. An accurate video quality feature extraction can be achieved by repeating the process of this DCTr layer several times. Taking the order relationship among the annotated data into account, we also propose a novel correlation loss term for model training. Experiments on various datasets confirm the effectiveness and robustness of our DCVQE model.

查看原文本刊更多论文

DCVQE:一种用于视频质量评估的分层变压器

用户生成视频的爆炸式增长刺激了对无参考视频质量评估(NR-VQA)的巨大需求。基于对人类注释行为的观察，我们提出了一种针对NR-VQA的分而治之的视频质量估计器(DCVQE)。从提取帧级质量嵌入(QE)开始，将整个序列分割为多个片段，并应用transformer学习帧级质量嵌入，同时更新帧级质量嵌入;引入了另一个变压器来组合剪辑级QE以生成视频级QE。我们把这种变压器的分层组合称为分而治之变压器(DCTr)层。通过多次重复该DCTr层的过程，可以获得准确的视频质量特征提取。考虑到标注数据之间的顺序关系，我们还提出了一种新的相关损失项用于模型训练。在不同数据集上的实验验证了DCVQE模型的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

自引率

0.00%

发文量