Tianwei Zhou , Songbai Tan , Leida Li , Baoquan Zhao , Qiuping Jiang , Guanghui Yue
{"title":"人工智能生成图像质量评估的跨模态交互关注网络","authors":"Tianwei Zhou , Songbai Tan , Leida Li , Baoquan Zhao , Qiuping Jiang , Guanghui Yue","doi":"10.1016/j.patcog.2025.111693","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, AI-generative techniques have revolutionized image creation, prompting the need for AI-generated image (AGI) quality assessment. This paper introduces CIA-Net, a Cross-modality Interactive Attention Network, for blind AGI quality evaluation. Using a multi-task framework, CIA-Net processes text and image inputs to output consistency, visual quality, and authenticity scores. Specifically, CIA-Net first encodes two-modal data to obtain textual and visual embeddings. Next, for consistency score prediction, it computes the similarity between these two kinds of embeddings in view of that text-to-image alignment. For visual quality prediction, it fuses textural and visual embeddings using a well-designed cross-modality interactive attention module. For authenticity score prediction, it constructs a textural template that contains authenticity labels and computes the joint probability from the similarity between the textural embeddings of each element and the visual embeddings. Experimental results show that CIA-Net is more competent for the AGI quality assessment task than 11 state-of-the-art competing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111693"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Modality Interactive Attention Network for AI-generated image quality assessment\",\"authors\":\"Tianwei Zhou , Songbai Tan , Leida Li , Baoquan Zhao , Qiuping Jiang , Guanghui Yue\",\"doi\":\"10.1016/j.patcog.2025.111693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, AI-generative techniques have revolutionized image creation, prompting the need for AI-generated image (AGI) quality assessment. This paper introduces CIA-Net, a Cross-modality Interactive Attention Network, for blind AGI quality evaluation. Using a multi-task framework, CIA-Net processes text and image inputs to output consistency, visual quality, and authenticity scores. Specifically, CIA-Net first encodes two-modal data to obtain textual and visual embeddings. Next, for consistency score prediction, it computes the similarity between these two kinds of embeddings in view of that text-to-image alignment. For visual quality prediction, it fuses textural and visual embeddings using a well-designed cross-modality interactive attention module. For authenticity score prediction, it constructs a textural template that contains authenticity labels and computes the joint probability from the similarity between the textural embeddings of each element and the visual embeddings. Experimental results show that CIA-Net is more competent for the AGI quality assessment task than 11 state-of-the-art competing methods.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"167 \",\"pages\":\"Article 111693\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S003132032500353X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032500353X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Cross-Modality Interactive Attention Network for AI-generated image quality assessment
Recently, AI-generative techniques have revolutionized image creation, prompting the need for AI-generated image (AGI) quality assessment. This paper introduces CIA-Net, a Cross-modality Interactive Attention Network, for blind AGI quality evaluation. Using a multi-task framework, CIA-Net processes text and image inputs to output consistency, visual quality, and authenticity scores. Specifically, CIA-Net first encodes two-modal data to obtain textual and visual embeddings. Next, for consistency score prediction, it computes the similarity between these two kinds of embeddings in view of that text-to-image alignment. For visual quality prediction, it fuses textural and visual embeddings using a well-designed cross-modality interactive attention module. For authenticity score prediction, it constructs a textural template that contains authenticity labels and computes the joint probability from the similarity between the textural embeddings of each element and the visual embeddings. Experimental results show that CIA-Net is more competent for the AGI quality assessment task than 11 state-of-the-art competing methods.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.