Yang Lu , Shuangyao Han , Zilu Zhou , Zifan Yang , Gaowei Zhang , Shaohui Jin , Xiaoheng Jiang , Mingliang Xu
{"title":"用于人工智能生成图像质量评估的深度跨模态提示学习网络","authors":"Yang Lu , Shuangyao Han , Zilu Zhou , Zifan Yang , Gaowei Zhang , Shaohui Jin , Xiaoheng Jiang , Mingliang Xu","doi":"10.1016/j.displa.2025.103208","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, multi-modal vision–language pre-trained models have been extensively adopted as foundational components for developing advanced Artificial Intelligence (AI) systems in computer vision applications. Previous approaches have advanced Artificial Intelligence Generated Image Quality Assessment (AGIQA) research via text-based or visual prompt learning, yet most methods remain constrained to a single modality (language or vision), overlooking the interplay between text and image. To address this issue, we propose a Deep Cross-Modal Prompt Learning Network (DCMPLN) for AGIQA. This model introduces a Multimodal Prompt Attention (MPA) module, employing multi-head attention to enhance the integration of textual and visual prompts. Furthermore, an Image Adapter module is incorporated into the visual pathway to extract novel features and fine-tune pre-trained ones using residual-style fusion. Experimental results on multiple generated image datasets demonstrate that the proposed method outperforms existing state-of-the-art image quality assessment models.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103208"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deep Cross-modal Prompt Learning Network for Artificial Intelligence Generated Image Quality Assessment\",\"authors\":\"Yang Lu , Shuangyao Han , Zilu Zhou , Zifan Yang , Gaowei Zhang , Shaohui Jin , Xiaoheng Jiang , Mingliang Xu\",\"doi\":\"10.1016/j.displa.2025.103208\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, multi-modal vision–language pre-trained models have been extensively adopted as foundational components for developing advanced Artificial Intelligence (AI) systems in computer vision applications. Previous approaches have advanced Artificial Intelligence Generated Image Quality Assessment (AGIQA) research via text-based or visual prompt learning, yet most methods remain constrained to a single modality (language or vision), overlooking the interplay between text and image. To address this issue, we propose a Deep Cross-Modal Prompt Learning Network (DCMPLN) for AGIQA. This model introduces a Multimodal Prompt Attention (MPA) module, employing multi-head attention to enhance the integration of textual and visual prompts. Furthermore, an Image Adapter module is incorporated into the visual pathway to extract novel features and fine-tune pre-trained ones using residual-style fusion. Experimental results on multiple generated image datasets demonstrate that the proposed method outperforms existing state-of-the-art image quality assessment models.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"91 \",\"pages\":\"Article 103208\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225002458\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002458","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Deep Cross-modal Prompt Learning Network for Artificial Intelligence Generated Image Quality Assessment
In recent years, multi-modal vision–language pre-trained models have been extensively adopted as foundational components for developing advanced Artificial Intelligence (AI) systems in computer vision applications. Previous approaches have advanced Artificial Intelligence Generated Image Quality Assessment (AGIQA) research via text-based or visual prompt learning, yet most methods remain constrained to a single modality (language or vision), overlooking the interplay between text and image. To address this issue, we propose a Deep Cross-Modal Prompt Learning Network (DCMPLN) for AGIQA. This model introduces a Multimodal Prompt Attention (MPA) module, employing multi-head attention to enhance the integration of textual and visual prompts. Furthermore, an Image Adapter module is incorporated into the visual pathway to extract novel features and fine-tune pre-trained ones using residual-style fusion. Experimental results on multiple generated image datasets demonstrate that the proposed method outperforms existing state-of-the-art image quality assessment models.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.