From Pixels to Rich-Nodes: A Cognition-Inspired Framework for Blind Image Quality Assessment

IF 4.8 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting Pub Date : 2024-10-07 DOI:10.1109/TBC.2024.3464418

Tian He;Lin Shi;Wenjia Xu;Yu Wang;Weijie Qiu;Houbang Guo;Zhuqing Jiang

{"title":"From Pixels to Rich-Nodes: A Cognition-Inspired Framework for Blind Image Quality Assessment","authors":"Tian He;Lin Shi;Wenjia Xu;Yu Wang;Weijie Qiu;Houbang Guo;Zhuqing Jiang","doi":"10.1109/TBC.2024.3464418","DOIUrl":null,"url":null,"abstract":"Blind image quality assessment (BIQA) is a subjective perception-driven task, which necessitates assessment results consistent with human cognition. The human cognitive system inherently involves both separation and integration mechanisms. Recent works have witnessed the success of deep learning methods in separating distortion features. Nonetheless, traditional deep-learning-based BIQA methods predominantly depend on fixed topology to mimic the information integration in the brain, which gives rise to scale sensitivity and low flexibility. To handle this challenge, we delve into the dynamic interactions among neurons and propose a cognition-inspired BIQA model. Drawing insights from the rich club structure in network neuroscience, a graph-inspired feature integrator is devised to reconstruct the network topology. Specifically, we argue that the activity of individual neurons (pixels) tends to exhibit a random fluctuation with ambiguous meaning, while clear and coherent cognition arises from neurons with high connectivity (rich-nodes). Therefore, a self-attention mechanism is employed to establish strong semantic associations between pixels and rich-nodes. Subsequently, we design intra- and inter-layer graph structures to promote the feature interaction across spatial and scale dimensions. Such dynamic circuits endow the BIQA method with efficient, flexible, and robust information processing capabilities, so as to achieve more human-subjective assessment results. Moreover, since the limited samples in existing IQA datasets are prone to model overfitting, we devise two prior hypotheses: frequency prior and ranking prior. The former stepwise augments high-frequency components that reflect the distortion degree during the multilevel feature extraction, while the latter seeks to motivate the model’s in-depth comprehension of differences in sample quality. Extensive experiments on five publicly datasets reveal that the proposed algorithm achieves competitive results.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"229-239"},"PeriodicalIF":4.8000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10706639","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10706639/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Blind image quality assessment (BIQA) is a subjective perception-driven task, which necessitates assessment results consistent with human cognition. The human cognitive system inherently involves both separation and integration mechanisms. Recent works have witnessed the success of deep learning methods in separating distortion features. Nonetheless, traditional deep-learning-based BIQA methods predominantly depend on fixed topology to mimic the information integration in the brain, which gives rise to scale sensitivity and low flexibility. To handle this challenge, we delve into the dynamic interactions among neurons and propose a cognition-inspired BIQA model. Drawing insights from the rich club structure in network neuroscience, a graph-inspired feature integrator is devised to reconstruct the network topology. Specifically, we argue that the activity of individual neurons (pixels) tends to exhibit a random fluctuation with ambiguous meaning, while clear and coherent cognition arises from neurons with high connectivity (rich-nodes). Therefore, a self-attention mechanism is employed to establish strong semantic associations between pixels and rich-nodes. Subsequently, we design intra- and inter-layer graph structures to promote the feature interaction across spatial and scale dimensions. Such dynamic circuits endow the BIQA method with efficient, flexible, and robust information processing capabilities, so as to achieve more human-subjective assessment results. Moreover, since the limited samples in existing IQA datasets are prone to model overfitting, we devise two prior hypotheses: frequency prior and ranking prior. The former stepwise augments high-frequency components that reflect the distortion degree during the multilevel feature extraction, while the latter seeks to motivate the model’s in-depth comprehension of differences in sample quality. Extensive experiments on five publicly datasets reveal that the proposed algorithm achieves competitive results.

查看原文本刊更多论文

从像素到富节点：盲图像质量评估的认知启发框架

盲图像质量评价（BIQA）是一项主观感知驱动的任务，要求评价结果与人类认知一致。人类认知系统固有地包含分离机制和整合机制。最近的工作见证了深度学习方法在分离失真特征方面的成功。然而，传统的基于深度学习的BIQA方法主要依赖于固定的拓扑结构来模拟大脑中的信息集成，这导致了规模敏感性和低灵活性。为了应对这一挑战，我们深入研究了神经元之间的动态相互作用，并提出了一个认知启发的BIQA模型。从网络神经科学中的富俱乐部结构中汲取灵感，设计了一个图形启发的特征积分器来重建网络拓扑。具体来说，我们认为单个神经元（像素）的活动倾向于表现出具有模糊含义的随机波动，而清晰和连贯的认知来自具有高连接性（富节点）的神经元。因此，采用自注意机制在像素和富节点之间建立强语义关联。随后，我们设计了层内和层间的图结构，以促进跨空间和尺度维度的特征交互。这种动态电路赋予了BIQA方法高效、灵活、鲁棒的信息处理能力，从而获得更人性化的主观评价结果。此外，由于现有IQA数据集的有限样本容易出现模型过拟合，我们设计了两个先验假设：频率先验和排名先验。前者在多层特征提取过程中逐步增强反映失真程度的高频分量，而后者旨在激发模型对样本质量差异的深入理解。在五个公开的数据集上进行的大量实验表明，所提出的算法取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Broadcasting 工程技术-电信学

CiteScore

9.40

自引率

31.10%

发文量

审稿时长

6-12 weeks

期刊介绍： The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”