No-reference image quality assessment based on improved vision transformer and transfer learning

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2025-02-11 DOI:10.1016/j.image.2025.117282

Bo Zhang , Luoxi Wang , Cheng Zhang , Ran Zhao , Jinlu Sun

{"title":"No-reference image quality assessment based on improved vision transformer and transfer learning","authors":"Bo Zhang , Luoxi Wang , Cheng Zhang , Ran Zhao , Jinlu Sun","doi":"10.1016/j.image.2025.117282","DOIUrl":null,"url":null,"abstract":"<div><div>To improve the accuracy and generalization performance of the existing no-reference image quality assessment models on small datasets, a no-reference image quality assessment model based on an improved vision transformer model and transfer learning is proposed. Firstly, ResNet is employed as a feature extraction network to obtain basic perceptual features from the input images, and a Convolutional Block Attention Module is introduced to further improve the network's feature extraction capabilities. Secondly, the Transformer Encoder is utilized to regress multi-layer features, improving the network's ability to capture global image information and predict scores. Lastly, to overcome the performance limitations of the Transformer model on small datasets, a transfer learning method is used to solve the dilemma of the relatively small capacity of the databases for image quality assessment. The model is trained and tested on three small-scale datasets and compared with seven mainstream algorithms. Performance is analyzed across three dimensions using statistical significance tests. The results show that, while the model does not perform best in distinguishing between similar and significantly different pairs, it still demonstrates competitive capabilities. Additionally, it performs exceptionally well in assessing quality differences and evaluating Area Under Curve, highlighting its strong potential for practical applications.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117282"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596525000293","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

To improve the accuracy and generalization performance of the existing no-reference image quality assessment models on small datasets, a no-reference image quality assessment model based on an improved vision transformer model and transfer learning is proposed. Firstly, ResNet is employed as a feature extraction network to obtain basic perceptual features from the input images, and a Convolutional Block Attention Module is introduced to further improve the network's feature extraction capabilities. Secondly, the Transformer Encoder is utilized to regress multi-layer features, improving the network's ability to capture global image information and predict scores. Lastly, to overcome the performance limitations of the Transformer model on small datasets, a transfer learning method is used to solve the dilemma of the relatively small capacity of the databases for image quality assessment. The model is trained and tested on three small-scale datasets and compared with seven mainstream algorithms. Performance is analyzed across three dimensions using statistical significance tests. The results show that, while the model does not perform best in distinguishing between similar and significantly different pairs, it still demonstrates competitive capabilities. Additionally, it performs exceptionally well in assessing quality differences and evaluating Area Under Curve, highlighting its strong potential for practical applications.

查看原文本刊更多论文

基于改进视觉变换和迁移学习的无参考图像质量评估

为了提高现有无参考图像质量评估模型在小数据集上的精度和泛化性能，提出了一种基于改进视觉变换模型和迁移学习的无参考图像质量评估模型。首先，采用ResNet作为特征提取网络，从输入图像中获取基本感知特征，并引入卷积块注意模块，进一步提高网络的特征提取能力。其次，利用Transformer Encoder对多层特征进行回归，提高了网络捕获全局图像信息和预测分数的能力。最后，为了克服Transformer模型在小数据集上的性能限制，采用迁移学习方法解决了图像质量评估数据库容量相对较小的难题。在三个小规模数据集上对模型进行了训练和测试，并与7种主流算法进行了比较。使用统计显著性检验跨三个维度分析性能。结果表明，虽然该模型在区分相似对和显著不同对方面表现不佳，但仍具有一定的竞争力。此外，它在评估质量差异和评估曲线下面积方面表现出色，突出了其强大的实际应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.