Multi-Scale Local and Global Feature Fusion for Blind Quality Assessment of Enhanced Images

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-18 DOI:10.1109/TCSVT.2025.3552086

Jingchao Cao;Shuai Zhang;Yutao Liu;Feng Gao;Ke Gu;Guangtao Zhai;Junyu Dong;Sam Kwong

{"title":"Multi-Scale Local and Global Feature Fusion for Blind Quality Assessment of Enhanced Images","authors":"Jingchao Cao;Shuai Zhang;Yutao Liu;Feng Gao;Ke Gu;Guangtao Zhai;Junyu Dong;Sam Kwong","doi":"10.1109/TCSVT.2025.3552086","DOIUrl":null,"url":null,"abstract":"Image enhancement plays a crucial role in computer vision by improving visual quality while minimizing distortion. Traditional methods enhance images through pixel value transformations, yet they often introduce new distortions. Recent advancements in deep learning-based techniques promise better results but challenge the preservation of image fidelity. Therefore, it is essential to evaluate the visual quality of enhanced images. However, existing quality assessment methods frequently encounter difficulties due to the unique distortions introduced by these enhancements, thereby restricting their effectiveness. To address these challenges, this paper proposes a novel blind image quality assessment (BIQA) method for enhanced natural images, termed multi-scale local feature fusion and global feature representation-based quality assessment (MLGQA). This model integrates three key components: a multi-scale Feature Attention Mechanism (FAM) for local feature extraction, a Local Feature Fusion (LFF) module for cross-scale feature synthesis, and a Global Feature Representation (GFR) module using Vision Transformers to capture global perceptual attributes. This synergistic framework effectively captures both fine-grained local distortions and broader global features that collectively define the visual quality of enhanced images. Furthermore, in the absence of a dedicated benchmark for enhanced natural images, we design the Natural Image Enhancement Database (NIED), a large-scale dataset consisting of 8,581 original images and 102,972 enhanced natural images generated through a wide array of traditional and deep learning-based enhancement techniques. Extensive experiments on NIED demonstrate that the proposed MLGQA model significantly outperforms current state-of-the-art BIQA methods in terms of both prediction accuracy and robustness.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8917-8928"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10930651/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Image enhancement plays a crucial role in computer vision by improving visual quality while minimizing distortion. Traditional methods enhance images through pixel value transformations, yet they often introduce new distortions. Recent advancements in deep learning-based techniques promise better results but challenge the preservation of image fidelity. Therefore, it is essential to evaluate the visual quality of enhanced images. However, existing quality assessment methods frequently encounter difficulties due to the unique distortions introduced by these enhancements, thereby restricting their effectiveness. To address these challenges, this paper proposes a novel blind image quality assessment (BIQA) method for enhanced natural images, termed multi-scale local feature fusion and global feature representation-based quality assessment (MLGQA). This model integrates three key components: a multi-scale Feature Attention Mechanism (FAM) for local feature extraction, a Local Feature Fusion (LFF) module for cross-scale feature synthesis, and a Global Feature Representation (GFR) module using Vision Transformers to capture global perceptual attributes. This synergistic framework effectively captures both fine-grained local distortions and broader global features that collectively define the visual quality of enhanced images. Furthermore, in the absence of a dedicated benchmark for enhanced natural images, we design the Natural Image Enhancement Database (NIED), a large-scale dataset consisting of 8,581 original images and 102,972 enhanced natural images generated through a wide array of traditional and deep learning-based enhancement techniques. Extensive experiments on NIED demonstrate that the proposed MLGQA model significantly outperforms current state-of-the-art BIQA methods in terms of both prediction accuracy and robustness.

查看原文本刊更多论文

基于多尺度局部和全局特征融合的增强图像盲质量评估

图像增强在计算机视觉中起着至关重要的作用，它可以提高视觉质量，同时最大限度地减少失真。传统的方法通过像素值变换来增强图像，但往往会引入新的畸变。基于深度学习的技术的最新进展有望取得更好的结果，但对图像保真度的保存提出了挑战。因此，对增强图像的视觉质量进行评价是十分必要的。然而，现有的质量评估方法由于这些改进带来的独特扭曲而经常遇到困难，从而限制了它们的有效性。为了解决这些问题，本文提出了一种新的用于增强自然图像的盲图像质量评估（BIQA）方法，称为多尺度局部特征融合和基于全局特征表示的质量评估（MLGQA）。该模型集成了三个关键组件：用于局部特征提取的多尺度特征注意机制（FAM），用于跨尺度特征合成的局部特征融合（LFF）模块，以及使用视觉变形器捕获全局感知属性的全局特征表示（GFR）模块。这种协同框架有效地捕获了细粒度的局部扭曲和更广泛的全局特征，这些特征共同定义了增强图像的视觉质量。此外，在缺乏用于增强自然图像的专用基准的情况下，我们设计了自然图像增强数据库（NIED），这是一个由8,581张原始图像和102,972张增强自然图像组成的大型数据集，这些图像是通过一系列传统和基于深度学习的增强技术生成的。在NIED上的大量实验表明，所提出的MLGQA模型在预测精度和鲁棒性方面都明显优于当前最先进的BIQA方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.