文本到图像生成的质量度量研究。

IF 6.5

IEEE transactions on visualization and computer graphics Pub Date : 2025-07-01 DOI:10.1109/TVCG.2025.3585077

Sebastian Hartwig, Dominik Engel, Leon Sick, Hannah Kniesel, Tristan Payer, Poonam Poonam, Michael Glockler, Alex Bauerle, Timo Ropinski

{"title":"文本到图像生成的质量度量研究。","authors":"Sebastian Hartwig, Dominik Engel, Leon Sick, Hannah Kniesel, Tristan Payer, Poonam Poonam, Michael Glockler, Alex Bauerle, Timo Ropinski","doi":"10.1109/TVCG.2025.3585077","DOIUrl":null,"url":null,"abstract":"AI-based text-to-image models do not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content. Consequently, these approaches have gathered increased attention within the computer graphics research community, which has been historically devoted towards traditional rendering techniques, that offer precise control over scene parameters (e.g., objects, materials, and lighting). While the quality of conventionally rendered images is assessed through well established image quality metrics, such as SSIM or PSNR, the unique challenges of text-to-image generation require other, dedicated quality metrics. These metrics must be able to not only measure overall image quality, but also how well images reflect given text prompts, whereby the control of scene and rendering parameters is interweaved. Within this survey, we provide a comprehensive overview of such text-to-image quality metrics, and propose a taxonomy to categorize these metrics. Our taxonomy is grounded in the assumption, that there are two main quality criteria, namely compositional quality and general quality, that contribute to the overall image quality. Besides the metrics, this survey covers dedicated text-to-image benchmark datasets, over which the metrics are frequently computed. Finally, we identify limitations and open challenges in the field of text-to-image generation, and derive guidelines for practitioners conducting text-to-image evaluation.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Survey on Quality Metrics for Text-to-Image Generation.\",\"authors\":\"Sebastian Hartwig, Dominik Engel, Leon Sick, Hannah Kniesel, Tristan Payer, Poonam Poonam, Michael Glockler, Alex Bauerle, Timo Ropinski\",\"doi\":\"10.1109/TVCG.2025.3585077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AI-based text-to-image models do not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content. Consequently, these approaches have gathered increased attention within the computer graphics research community, which has been historically devoted towards traditional rendering techniques, that offer precise control over scene parameters (e.g., objects, materials, and lighting). While the quality of conventionally rendered images is assessed through well established image quality metrics, such as SSIM or PSNR, the unique challenges of text-to-image generation require other, dedicated quality metrics. These metrics must be able to not only measure overall image quality, but also how well images reflect given text prompts, whereby the control of scene and rendering parameters is interweaved. Within this survey, we provide a comprehensive overview of such text-to-image quality metrics, and propose a taxonomy to categorize these metrics. Our taxonomy is grounded in the assumption, that there are two main quality criteria, namely compositional quality and general quality, that contribute to the overall image quality. Besides the metrics, this survey covers dedicated text-to-image benchmark datasets, over which the metrics are frequently computed. Finally, we identify limitations and open challenges in the field of text-to-image generation, and derive guidelines for practitioners conducting text-to-image evaluation.\",\"PeriodicalId\":94035,\"journal\":{\"name\":\"IEEE transactions on visualization and computer graphics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on visualization and computer graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TVCG.2025.3585077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3585077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于人工智能的文本到图像模型不仅擅长生成逼真的图像，而且还为设计师提供了对图像内容越来越细粒度的控制。因此，这些方法在计算机图形学研究社区中引起了越来越多的关注，这些研究社区一直致力于传统的渲染技术，这些技术提供了对场景参数（例如，对象，材料和照明）的精确控制。虽然传统渲染图像的质量是通过完善的图像质量指标（如SSIM或PSNR）来评估的，但文本到图像生成的独特挑战需要其他专用的质量指标。这些指标必须不仅能够衡量整体图像质量，还能很好地反映给定的文本提示，从而控制场景和渲染参数交织在一起。在本调查中，我们提供了文本到图像质量指标的全面概述，并提出了对这些指标进行分类的分类法。我们的分类是基于这样的假设，即有两个主要的质量标准，即构图质量和一般质量，这有助于整体图像质量。除了度量之外，本调查还涵盖了专用的文本到图像的基准数据集，在这些数据集上经常计算度量。最后，我们确定了文本到图像生成领域的局限性和开放挑战，并为从业者进行文本到图像评估提供了指导方针。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Survey on Quality Metrics for Text-to-Image Generation.

AI-based text-to-image models do not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content. Consequently, these approaches have gathered increased attention within the computer graphics research community, which has been historically devoted towards traditional rendering techniques, that offer precise control over scene parameters (e.g., objects, materials, and lighting). While the quality of conventionally rendered images is assessed through well established image quality metrics, such as SSIM or PSNR, the unique challenges of text-to-image generation require other, dedicated quality metrics. These metrics must be able to not only measure overall image quality, but also how well images reflect given text prompts, whereby the control of scene and rendering parameters is interweaved. Within this survey, we provide a comprehensive overview of such text-to-image quality metrics, and propose a taxonomy to categorize these metrics. Our taxonomy is grounded in the assumption, that there are two main quality criteria, namely compositional quality and general quality, that contribute to the overall image quality. Besides the metrics, this survey covers dedicated text-to-image benchmark datasets, over which the metrics are frequently computed. Finally, we identify limitations and open challenges in the field of text-to-image generation, and derive guidelines for practitioners conducting text-to-image evaluation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量