T2VEval: Benchmark dataset and objective evaluation method for T2V-generated videos

IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Zelu Qi , Ping Shi , Shuqi Wang , Chaoyang Zhang , Fei Zhao , Zefeng Ying , Da Pan , Xi Yang , Zheqi He , Teng Dai
{"title":"T2VEval: Benchmark dataset and objective evaluation method for T2V-generated videos","authors":"Zelu Qi ,&nbsp;Ping Shi ,&nbsp;Shuqi Wang ,&nbsp;Chaoyang Zhang ,&nbsp;Fei Zhao ,&nbsp;Zefeng Ying ,&nbsp;Da Pan ,&nbsp;Xi Yang ,&nbsp;Zheqi He ,&nbsp;Teng Dai","doi":"10.1016/j.displa.2025.103178","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in text-to-video (T2V) technology, as demonstrated by models such as Runway Gen-3, Pika, Sora, and Kling, have significantly broadened the applicability and popularity of the technology. This progress has created a growing demand for accurate quality assessment metrics to evaluate the perceptual quality of T2V-generated videos and optimize video generation models. However, assessing the quality of text-to-video outputs remain challenging due to the presence of highly complex distortions, such as unnatural actions and phenomena that defy human cognition. To address these challenges, we constructed T2VEval-Bench, a multi-dimensional benchmark dataset for text-to-video quality evaluation, which contains 148 textual prompts and 1,783 videos generated by 13 T2V models. To ensure a comprehensive evaluation, we scored each video on four dimensions in the subjective experiment, which are overall impression, text–video consistency, realness, and technical quality. Based on T2VEval-Bench, we developed T2VEval, a multi-branch fusion scheme for T2V quality evaluation. T2VEval assesses videos across three branches: text–video consistency, realness, and technical quality. Using an attention-based fusion module, T2VEval effectively integrates features from each branch and predicts scores with the aid of a large language model. Additionally, we implemented a divide-and-conquer training strategy, enabling each branch to learn targeted knowledge while maintaining synergy with the others. Experimental results demonstrate that T2VEval achieves state-of-the-art performance across multiple metrics.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103178"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S014193822500215X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in text-to-video (T2V) technology, as demonstrated by models such as Runway Gen-3, Pika, Sora, and Kling, have significantly broadened the applicability and popularity of the technology. This progress has created a growing demand for accurate quality assessment metrics to evaluate the perceptual quality of T2V-generated videos and optimize video generation models. However, assessing the quality of text-to-video outputs remain challenging due to the presence of highly complex distortions, such as unnatural actions and phenomena that defy human cognition. To address these challenges, we constructed T2VEval-Bench, a multi-dimensional benchmark dataset for text-to-video quality evaluation, which contains 148 textual prompts and 1,783 videos generated by 13 T2V models. To ensure a comprehensive evaluation, we scored each video on four dimensions in the subjective experiment, which are overall impression, text–video consistency, realness, and technical quality. Based on T2VEval-Bench, we developed T2VEval, a multi-branch fusion scheme for T2V quality evaluation. T2VEval assesses videos across three branches: text–video consistency, realness, and technical quality. Using an attention-based fusion module, T2VEval effectively integrates features from each branch and predicts scores with the aid of a large language model. Additionally, we implemented a divide-and-conquer training strategy, enabling each branch to learn targeted knowledge while maintaining synergy with the others. Experimental results demonstrate that T2VEval achieves state-of-the-art performance across multiple metrics.
T2VEval: t2v生成视频的基准数据集和客观评价方法
文本到视频(T2V)技术的最新进展,如Runway Gen-3、Pika、Sora和Kling等模型所证明的那样,大大拓宽了该技术的适用性和普及程度。这一进展导致对准确质量评估指标的需求不断增长,以评估ttv生成的视频的感知质量,并优化视频生成模型。然而,由于存在高度复杂的扭曲,例如违背人类认知的非自然行为和现象,评估文本到视频输出的质量仍然具有挑战性。为了解决这些挑战,我们构建了T2V eval - bench,这是一个用于文本到视频质量评估的多维基准数据集,其中包含由13个T2V模型生成的148个文本提示和1783个视频。为了保证评价的全面性,我们在主观实验中从整体印象、文字视频一致性、真实性和技术质量四个维度对每个视频进行打分。在T2VEval- bench的基础上,提出了T2VEval多分支融合的T2V质量评估方案。T2VEval从三个方面对视频进行评估:文本视频一致性、真实性和技术质量。使用基于注意力的融合模块,T2VEval有效地整合了每个分支的特征,并借助大型语言模型预测分数。此外,我们实施了分而治之的培训策略,使每个分支都能学习目标知识,同时保持与其他分支的协同作用。实验结果表明,T2VEval在多个指标上实现了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Displays
Displays 工程技术-工程:电子与电气
CiteScore
4.60
自引率
25.60%
发文量
138
审稿时长
92 days
期刊介绍: Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信