Philippe Hanhart, Lukáš Krasula, P. Callet, T. Ebrahimi
{"title":"如何从成对比较数据中对客观质量指标进行基准测试?","authors":"Philippe Hanhart, Lukáš Krasula, P. Callet, T. Ebrahimi","doi":"10.1109/QoMEX.2016.7498960","DOIUrl":null,"url":null,"abstract":"The procedures commonly used to evaluate the performance of objective quality metrics rely on ground truth mean opinion scores and associated confidence intervals, which are usually obtained via direct scaling methods. However, indirect scaling methods, such as the paired comparison method, can also be used to collect ground truth preference scores. Indirect scaling methods have a higher discriminatory power and are gaining popularity, for example in crowdsourcing evaluations. In this paper, we present how the classification errors, an existing analysis tool, can also be used with subjective preference scores. Additionally, we propose a new analysis tool based on the receiver operating characteristic analysis. This tool can be used to further assess the performance of objective metrics based on ground truth preference scores. We provide a MATLAB script with an implementation of the proposed tools and we show one example of application of the proposed tools.","PeriodicalId":6645,"journal":{"name":"2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX)","volume":"10 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"How to benchmark objective quality metrics from paired comparison data?\",\"authors\":\"Philippe Hanhart, Lukáš Krasula, P. Callet, T. Ebrahimi\",\"doi\":\"10.1109/QoMEX.2016.7498960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The procedures commonly used to evaluate the performance of objective quality metrics rely on ground truth mean opinion scores and associated confidence intervals, which are usually obtained via direct scaling methods. However, indirect scaling methods, such as the paired comparison method, can also be used to collect ground truth preference scores. Indirect scaling methods have a higher discriminatory power and are gaining popularity, for example in crowdsourcing evaluations. In this paper, we present how the classification errors, an existing analysis tool, can also be used with subjective preference scores. Additionally, we propose a new analysis tool based on the receiver operating characteristic analysis. This tool can be used to further assess the performance of objective metrics based on ground truth preference scores. We provide a MATLAB script with an implementation of the proposed tools and we show one example of application of the proposed tools.\",\"PeriodicalId\":6645,\"journal\":{\"name\":\"2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX)\",\"volume\":\"10 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QoMEX.2016.7498960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QoMEX.2016.7498960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
How to benchmark objective quality metrics from paired comparison data?
The procedures commonly used to evaluate the performance of objective quality metrics rely on ground truth mean opinion scores and associated confidence intervals, which are usually obtained via direct scaling methods. However, indirect scaling methods, such as the paired comparison method, can also be used to collect ground truth preference scores. Indirect scaling methods have a higher discriminatory power and are gaining popularity, for example in crowdsourcing evaluations. In this paper, we present how the classification errors, an existing analysis tool, can also be used with subjective preference scores. Additionally, we propose a new analysis tool based on the receiver operating characteristic analysis. This tool can be used to further assess the performance of objective metrics based on ground truth preference scores. We provide a MATLAB script with an implementation of the proposed tools and we show one example of application of the proposed tools.