{"title":"基于视觉语言对应的人工智能生成全方位图像的视口独立盲质评估","authors":"Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang","doi":"10.1109/LSP.2025.3556791","DOIUrl":null,"url":null,"abstract":"The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1630-1634"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence\",\"authors\":\"Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang\",\"doi\":\"10.1109/LSP.2025.3556791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"1630-1634\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10947292/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10947292/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence
The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.