{"title":"Going Big or Going Precise: Considerations in building the next-gen VQA Database","authors":"Franz Götz-Hahn","doi":"10.1145/3423268.3423587","DOIUrl":null,"url":null,"abstract":"Annotated data is a requirement for any kind of modeling of subjective attributes and is usually constrained by a fixed budget available for paying annotators. The distribution of this budget is non-trivial, if the available data is large enough. In the case of video quality assessment (VQA) datasets, it has been commonly deemed more important to evaluate at a higher precision, i.e. getting more annotations for each item, than getting more data annotated less precisely. Considering the highly complex way different technical quality impairments caused by different parts of multiple video processing pipelines interact, the few hundred items comprising existing VQA datasets are unlikely to cover the vast degradation space required to generalize well. An open question, then, is whether some annotation precision can be sacrificed for additional data without loss of generalization power. How does shifting the vote budget from say 1,000 items at 100 annotations to 100,000 items with a single annotation affect predictive performances of state-of-the-art models? This talk addresses this question at the hand of a new large-scale two-part VQA dataset [1] comprising, on the one hand, over 1,500 items annotated with a minimum of 89 votes and, on the other hand, over 150,000 items annotated with 5 votes. Based on this dataset, different VQA approaches were compared at different distributions of a fixed vote budget and, surprisingly, their generalization performance was found to be invariant to this distribution of the budget. This held true for the typical within-dataset testing as well as cross-dataset testing.","PeriodicalId":393702,"journal":{"name":"Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends","volume":"366 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop on Aesthetic and Technical Quality Assessment of Multimedia and Media Analytics for Societal Trends","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3423268.3423587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Annotated data is a requirement for any kind of modeling of subjective attributes and is usually constrained by a fixed budget available for paying annotators. The distribution of this budget is non-trivial, if the available data is large enough. In the case of video quality assessment (VQA) datasets, it has been commonly deemed more important to evaluate at a higher precision, i.e. getting more annotations for each item, than getting more data annotated less precisely. Considering the highly complex way different technical quality impairments caused by different parts of multiple video processing pipelines interact, the few hundred items comprising existing VQA datasets are unlikely to cover the vast degradation space required to generalize well. An open question, then, is whether some annotation precision can be sacrificed for additional data without loss of generalization power. How does shifting the vote budget from say 1,000 items at 100 annotations to 100,000 items with a single annotation affect predictive performances of state-of-the-art models? This talk addresses this question at the hand of a new large-scale two-part VQA dataset [1] comprising, on the one hand, over 1,500 items annotated with a minimum of 89 votes and, on the other hand, over 150,000 items annotated with 5 votes. Based on this dataset, different VQA approaches were compared at different distributions of a fixed vote budget and, surprisingly, their generalization performance was found to be invariant to this distribution of the budget. This held true for the typical within-dataset testing as well as cross-dataset testing.