{"title":"基于多尺度局部和全局特征融合的增强图像盲质量评估","authors":"Jingchao Cao;Shuai Zhang;Yutao Liu;Feng Gao;Ke Gu;Guangtao Zhai;Junyu Dong;Sam Kwong","doi":"10.1109/TCSVT.2025.3552086","DOIUrl":null,"url":null,"abstract":"Image enhancement plays a crucial role in computer vision by improving visual quality while minimizing distortion. Traditional methods enhance images through pixel value transformations, yet they often introduce new distortions. Recent advancements in deep learning-based techniques promise better results but challenge the preservation of image fidelity. Therefore, it is essential to evaluate the visual quality of enhanced images. However, existing quality assessment methods frequently encounter difficulties due to the unique distortions introduced by these enhancements, thereby restricting their effectiveness. To address these challenges, this paper proposes a novel blind image quality assessment (BIQA) method for enhanced natural images, termed multi-scale local feature fusion and global feature representation-based quality assessment (MLGQA). This model integrates three key components: a multi-scale Feature Attention Mechanism (FAM) for local feature extraction, a Local Feature Fusion (LFF) module for cross-scale feature synthesis, and a Global Feature Representation (GFR) module using Vision Transformers to capture global perceptual attributes. This synergistic framework effectively captures both fine-grained local distortions and broader global features that collectively define the visual quality of enhanced images. Furthermore, in the absence of a dedicated benchmark for enhanced natural images, we design the Natural Image Enhancement Database (NIED), a large-scale dataset consisting of 8,581 original images and 102,972 enhanced natural images generated through a wide array of traditional and deep learning-based enhancement techniques. Extensive experiments on NIED demonstrate that the proposed MLGQA model significantly outperforms current state-of-the-art BIQA methods in terms of both prediction accuracy and robustness.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8917-8928"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Scale Local and Global Feature Fusion for Blind Quality Assessment of Enhanced Images\",\"authors\":\"Jingchao Cao;Shuai Zhang;Yutao Liu;Feng Gao;Ke Gu;Guangtao Zhai;Junyu Dong;Sam Kwong\",\"doi\":\"10.1109/TCSVT.2025.3552086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image enhancement plays a crucial role in computer vision by improving visual quality while minimizing distortion. Traditional methods enhance images through pixel value transformations, yet they often introduce new distortions. Recent advancements in deep learning-based techniques promise better results but challenge the preservation of image fidelity. Therefore, it is essential to evaluate the visual quality of enhanced images. However, existing quality assessment methods frequently encounter difficulties due to the unique distortions introduced by these enhancements, thereby restricting their effectiveness. To address these challenges, this paper proposes a novel blind image quality assessment (BIQA) method for enhanced natural images, termed multi-scale local feature fusion and global feature representation-based quality assessment (MLGQA). This model integrates three key components: a multi-scale Feature Attention Mechanism (FAM) for local feature extraction, a Local Feature Fusion (LFF) module for cross-scale feature synthesis, and a Global Feature Representation (GFR) module using Vision Transformers to capture global perceptual attributes. This synergistic framework effectively captures both fine-grained local distortions and broader global features that collectively define the visual quality of enhanced images. Furthermore, in the absence of a dedicated benchmark for enhanced natural images, we design the Natural Image Enhancement Database (NIED), a large-scale dataset consisting of 8,581 original images and 102,972 enhanced natural images generated through a wide array of traditional and deep learning-based enhancement techniques. Extensive experiments on NIED demonstrate that the proposed MLGQA model significantly outperforms current state-of-the-art BIQA methods in terms of both prediction accuracy and robustness.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"8917-8928\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10930651/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10930651/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Multi-Scale Local and Global Feature Fusion for Blind Quality Assessment of Enhanced Images
Image enhancement plays a crucial role in computer vision by improving visual quality while minimizing distortion. Traditional methods enhance images through pixel value transformations, yet they often introduce new distortions. Recent advancements in deep learning-based techniques promise better results but challenge the preservation of image fidelity. Therefore, it is essential to evaluate the visual quality of enhanced images. However, existing quality assessment methods frequently encounter difficulties due to the unique distortions introduced by these enhancements, thereby restricting their effectiveness. To address these challenges, this paper proposes a novel blind image quality assessment (BIQA) method for enhanced natural images, termed multi-scale local feature fusion and global feature representation-based quality assessment (MLGQA). This model integrates three key components: a multi-scale Feature Attention Mechanism (FAM) for local feature extraction, a Local Feature Fusion (LFF) module for cross-scale feature synthesis, and a Global Feature Representation (GFR) module using Vision Transformers to capture global perceptual attributes. This synergistic framework effectively captures both fine-grained local distortions and broader global features that collectively define the visual quality of enhanced images. Furthermore, in the absence of a dedicated benchmark for enhanced natural images, we design the Natural Image Enhancement Database (NIED), a large-scale dataset consisting of 8,581 original images and 102,972 enhanced natural images generated through a wide array of traditional and deep learning-based enhancement techniques. Extensive experiments on NIED demonstrate that the proposed MLGQA model significantly outperforms current state-of-the-art BIQA methods in terms of both prediction accuracy and robustness.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.