{"title":"ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images","authors":"Abhinaw Jagtap, Nachiket Tapas, R. G. Brajesh","doi":"arxiv-2409.11874","DOIUrl":null,"url":null,"abstract":"In the fast-evolving field of Generative AI, platforms like MidJourney,\nDALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation.\nHowever, despite their impressive ability to create high-quality images, they\noften struggle to generate accurate text within these images. Theoretically, if\nwe could achieve accurate text generation in AI images in a ``zero-shot''\nmanner, it would not only make AI-generated images more meaningful but also\ndemocratize the graphic design industry. The first step towards this goal is to\ncreate a robust scoring matrix for evaluating text accuracy in AI-generated\nimages. Although there are existing bench-marking methods like CLIP SCORE and\nT2I-CompBench++, there's still a gap in systematically evaluating text and\ntypography in AI-generated images, especially with diffusion-based methods. In\nthis paper, we introduce a novel evaluation matrix designed explicitly for\nquantifying the performance of text and typography generation within\nAI-generated images. We have used letter by letter matching strategy to compute\nthe exact matching scores from the reference text to the AI generated text. Our\nnovel approach to calculate the score takes care of multiple redundancies such\nas repetition of words, case sensitivity, mixing of words, irregular\nincorporation of letters etc. Moreover, we have developed a Novel method named\nas brevity adjustment to handle excess text. In addition we have also done a\nquantitative analysis of frequent errors arise due to frequently used words and\nless frequently used words. Project page is available at:\nhttps://github.com/Abhinaw3906/ABHINAW-MATRIX.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the fast-evolving field of Generative AI, platforms like MidJourney,
DALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation.
However, despite their impressive ability to create high-quality images, they
often struggle to generate accurate text within these images. Theoretically, if
we could achieve accurate text generation in AI images in a ``zero-shot''
manner, it would not only make AI-generated images more meaningful but also
democratize the graphic design industry. The first step towards this goal is to
create a robust scoring matrix for evaluating text accuracy in AI-generated
images. Although there are existing bench-marking methods like CLIP SCORE and
T2I-CompBench++, there's still a gap in systematically evaluating text and
typography in AI-generated images, especially with diffusion-based methods. In
this paper, we introduce a novel evaluation matrix designed explicitly for
quantifying the performance of text and typography generation within
AI-generated images. We have used letter by letter matching strategy to compute
the exact matching scores from the reference text to the AI generated text. Our
novel approach to calculate the score takes care of multiple redundancies such
as repetition of words, case sensitivity, mixing of words, irregular
incorporation of letters etc. Moreover, we have developed a Novel method named
as brevity adjustment to handle excess text. In addition we have also done a
quantitative analysis of frequent errors arise due to frequently used words and
less frequently used words. Project page is available at:
https://github.com/Abhinaw3906/ABHINAW-MATRIX.