Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang
{"title":"Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence","authors":"Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang","doi":"10.1109/LSP.2025.3556791","DOIUrl":null,"url":null,"abstract":"The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1630-1634"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10947292/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.
基于视觉语言对应的人工智能生成全方位图像的视口独立盲质评估
深度生成技术的进步显著促进了人工智能生成内容(AIGC)的增长。其中,人工智能生成的全向图像(AGOIs)在虚拟现实(VR)中具有相当大的应用前景。然而,AGOIs的质量参差不齐,对其质量评价的研究有限。在这封信中,受人类视觉系统特征的启发,我们提出了一种新的独立于视口的agai盲质量评估方法,称为VI-AGOIQA,它利用了视觉语言对应关系。具体而言,为了最大限度地减少基于视口的全向图像质量评估预测方法的计算负担,首先以等矩形投影(ERP)格式从AGOIs中提取一组图像补丁。然后,利用对比语言-图像预训练(CLIP)模型的预训练图像和文本编码器,有效地学习视觉和文本输入之间的对应关系。最后,基于学习到的视觉语言一致性知识,应用多模态特征融合模块预测人类视觉偏好。在公开数据库上进行的大量实验表明,该方法具有良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信