基于语言的图像质量评估

ACM Multimedia Asia Pub Date : 2021-12-01 DOI:10.1145/3469877.3490605

L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo

{"title":"基于语言的图像质量评估","authors":"L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo","doi":"10.1145/3469877.3490605","DOIUrl":null,"url":null,"abstract":"Evaluation of generative models, in the visual domain, is often performed providing anecdotal results to the reader. In the case of image enhancement, reference images are usually available. Nonetheless, using signal based metrics often leads to counterintuitive results: highly natural crisp images may obtain worse scores than blurry ones. On the other hand, blind reference image assessment may rank images reconstructed with GANs higher than the original undistorted images. To avoid time consuming human based image assessment, semantic computer vision tasks may be exploited instead [9, 25, 33]. In this paper we advocate the use of language generation tasks to evaluate the quality of restored images. We show experimentally that image captioning, used as a downstream task, may serve as a method to score image quality. Captioning scores are better aligned with human rankings with respect to signal based metrics or no-reference image quality metrics. We show insights on how the corruption, by artifacts, of local image structure may steer image captions in the wrong direction.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Language Based Image Quality Assessment\",\"authors\":\"L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo\",\"doi\":\"10.1145/3469877.3490605\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Evaluation of generative models, in the visual domain, is often performed providing anecdotal results to the reader. In the case of image enhancement, reference images are usually available. Nonetheless, using signal based metrics often leads to counterintuitive results: highly natural crisp images may obtain worse scores than blurry ones. On the other hand, blind reference image assessment may rank images reconstructed with GANs higher than the original undistorted images. To avoid time consuming human based image assessment, semantic computer vision tasks may be exploited instead [9, 25, 33]. In this paper we advocate the use of language generation tasks to evaluate the quality of restored images. We show experimentally that image captioning, used as a downstream task, may serve as a method to score image quality. Captioning scores are better aligned with human rankings with respect to signal based metrics or no-reference image quality metrics. We show insights on how the corruption, by artifacts, of local image structure may steer image captions in the wrong direction.\",\"PeriodicalId\":210974,\"journal\":{\"name\":\"ACM Multimedia Asia\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3469877.3490605\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在视觉领域，生成模型的评估通常是为读者提供轶事结果。在图像增强的情况下，通常可以使用参考图像。尽管如此，使用基于信号的指标往往会导致违反直觉的结果:高度自然的清晰图像可能比模糊的图像获得更低的分数。另一方面，盲参考图像评估可能会使gan重建的图像排名高于原始未失真图像。为了避免耗时的基于人类的图像评估，可以利用语义计算机视觉任务来代替[9,25,33]。在本文中，我们提倡使用语言生成任务来评估恢复图像的质量。我们通过实验证明，作为下游任务的图像字幕可以作为图像质量评分的一种方法。相对于基于信号的指标或无参考图像质量指标，字幕分数更好地与人类排名保持一致。我们展示了局部图像结构的人为破坏如何将图像标题导向错误方向的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Language Based Image Quality Assessment

Evaluation of generative models, in the visual domain, is often performed providing anecdotal results to the reader. In the case of image enhancement, reference images are usually available. Nonetheless, using signal based metrics often leads to counterintuitive results: highly natural crisp images may obtain worse scores than blurry ones. On the other hand, blind reference image assessment may rank images reconstructed with GANs higher than the original undistorted images. To avoid time consuming human based image assessment, semantic computer vision tasks may be exploited instead [9, 25, 33]. In this paper we advocate the use of language generation tasks to evaluate the quality of restored images. We show experimentally that image captioning, used as a downstream task, may serve as a method to score image quality. Captioning scores are better aligned with human rankings with respect to signal based metrics or no-reference image quality metrics. We show insights on how the corruption, by artifacts, of local image structure may steer image captions in the wrong direction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Multimedia Asia

自引率

0.00%

发文量