Going Big or Going Precise: Considerations in building the next-gen VQA Database

Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends Pub Date : 2020-10-12 DOI:10.1145/3423268.3423587

Franz Götz-Hahn

{"title":"Going Big or Going Precise: Considerations in building the next-gen VQA Database","authors":"Franz Götz-Hahn","doi":"10.1145/3423268.3423587","DOIUrl":null,"url":null,"abstract":"Annotated data is a requirement for any kind of modeling of subjective attributes and is usually constrained by a fixed budget available for paying annotators. The distribution of this budget is non-trivial, if the available data is large enough. In the case of video quality assessment (VQA) datasets, it has been commonly deemed more important to evaluate at a higher precision, i.e. getting more annotations for each item, than getting more data annotated less precisely. Considering the highly complex way different technical quality impairments caused by different parts of multiple video processing pipelines interact, the few hundred items comprising existing VQA datasets are unlikely to cover the vast degradation space required to generalize well. An open question, then, is whether some annotation precision can be sacrificed for additional data without loss of generalization power. How does shifting the vote budget from say 1,000 items at 100 annotations to 100,000 items with a single annotation affect predictive performances of state-of-the-art models? This talk addresses this question at the hand of a new large-scale two-part VQA dataset [1] comprising, on the one hand, over 1,500 items annotated with a minimum of 89 votes and, on the other hand, over 150,000 items annotated with 5 votes. Based on this dataset, different VQA approaches were compared at different distributions of a fixed vote budget and, surprisingly, their generalization performance was found to be invariant to this distribution of the budget. This held true for the typical within-dataset testing as well as cross-dataset testing.","PeriodicalId":393702,"journal":{"name":"Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends","volume":"366 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3423268.3423587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Annotated data is a requirement for any kind of modeling of subjective attributes and is usually constrained by a fixed budget available for paying annotators. The distribution of this budget is non-trivial, if the available data is large enough. In the case of video quality assessment (VQA) datasets, it has been commonly deemed more important to evaluate at a higher precision, i.e. getting more annotations for each item, than getting more data annotated less precisely. Considering the highly complex way different technical quality impairments caused by different parts of multiple video processing pipelines interact, the few hundred items comprising existing VQA datasets are unlikely to cover the vast degradation space required to generalize well. An open question, then, is whether some annotation precision can be sacrificed for additional data without loss of generalization power. How does shifting the vote budget from say 1,000 items at 100 annotations to 100,000 items with a single annotation affect predictive performances of state-of-the-art models? This talk addresses this question at the hand of a new large-scale two-part VQA dataset [1] comprising, on the one hand, over 1,500 items annotated with a minimum of 89 votes and, on the other hand, over 150,000 items annotated with 5 votes. Based on this dataset, different VQA approaches were compared at different distributions of a fixed vote budget and, surprisingly, their generalization performance was found to be invariant to this distribution of the budget. This held true for the typical within-dataset testing as well as cross-dataset testing.

查看原文本刊更多论文

做大还是做精确:构建下一代VQA数据库的考虑因素

带注释的数据是任何类型的主观属性建模的必要条件，通常受到支付注释者的固定预算的限制。如果可用数据足够大，这个预算的分布是非平凡的。在视频质量评估(VQA)数据集的情况下，通常认为以更高的精度进行评估更重要，即为每个项目获得更多的注释，而不是获得更多不精确注释的数据。考虑到多个视频处理管道的不同部分相互作用导致的不同技术质量损害的高度复杂方式，由现有VQA数据集组成的几百个项目不太可能覆盖良好泛化所需的巨大退化空间。那么，一个悬而未决的问题是，是否可以在不损失泛化能力的情况下牺牲一些注释精度来获取额外的数据。将投票预算从100个注释下的1000个条目转移到一个注释下的100,000个条目会如何影响最先进模型的预测性能?本次演讲通过一个新的大规模两部分VQA数据集[1]来解决这个问题，其中一方面包含超过1500个标注了至少89票的项目，另一方面包含超过15万个标注了5票的项目。基于该数据集，在固定投票预算的不同分布下比较了不同的VQA方法，令人惊讶的是，它们的泛化性能对该预算分布是不变的。这适用于典型的数据集内测试和跨数据集测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends

自引率

0.00%

发文量