Signal Processing-Image Communication最新文献

筛选
英文 中文
Towards human society-inspired decentralized DNN inference 面向人类社会的去中心化DNN推理
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-03-13 DOI: 10.1016/j.image.2025.117306
Dimitrios Papaioannou, Vasileios Mygdalis, Ioannis Pitas
{"title":"Towards human society-inspired decentralized DNN inference","authors":"Dimitrios Papaioannou,&nbsp;Vasileios Mygdalis,&nbsp;Ioannis Pitas","doi":"10.1016/j.image.2025.117306","DOIUrl":"10.1016/j.image.2025.117306","url":null,"abstract":"<div><div>In human societies, individuals make their own decisions and they may select if and who may influence it, by e.g., consulting with people of their acquaintance or experts of a field. At a societal level, the overall knowledge is preserved and enhanced by individual person empowerment, where complicated consensus protocols have been developed over time in the form of societal mechanisms to assess, weight, combine and isolate individual people opinions. In distributed machine learning environments however, individual AI agents are merely part of a system where decisions are made in a centralized and aggregated fashion or require a fixed network topology, a practice prone to security risks and collaboration is nearly absent. For instance, Byzantine Failures may tamper both the training and inference stage of individual AI agents, leading to significantly reduced overall system performance. Inspired by societal practices, we propose a decentralized inference strategy where each individual agent is empowered to make their own decisions, by exchanging and aggregating information with other agents in their network. To this end, a “Quality of Inference” consensus protocol (QoI) is proposed, forming a single commonly accepted inference rule applied by every individual agent. The overall system knowledge and decisions on specific manners can thereby be stored by all individual agents in a decentralized fashion, employing e.g., blockchain technology. Our experiments in classification tasks indicate that the proposed approach forms a secure decentralized inference framework, that prevents adversaries at tampering the overall process and achieves comparable performance with centralized decision aggregation methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117306"},"PeriodicalIF":3.4,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143643817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive spatially regularized target attribute-aware background suppressed deep correlation filter for object tracking 自适应空间正则化目标属性感知背景抑制深度相关滤波器
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-03-11 DOI: 10.1016/j.image.2025.117305
Sathiyamoorthi Arthanari, Sathishkumar Moorthy, Jae Hoon Jeong, Young Hoon Joo
{"title":"Adaptive spatially regularized target attribute-aware background suppressed deep correlation filter for object tracking","authors":"Sathiyamoorthi Arthanari,&nbsp;Sathishkumar Moorthy,&nbsp;Jae Hoon Jeong,&nbsp;Young Hoon Joo","doi":"10.1016/j.image.2025.117305","DOIUrl":"10.1016/j.image.2025.117305","url":null,"abstract":"<div><div>In recent years, deep feature-based correlation filters have attained impressive performance in robust object tracking. However, deep feature-based correlation filters are affected by undesired boundary effects, which reduce the tracking performance. Moreover, the tracker moves towards a region that is identical to the target due to the sudden variation in target appearance and complicated background areas. To overcome these issues, we propose an adaptive spatially regularized target attribute-aware background suppressed deep correlation filter (ASTABSCF). To do this, a novel adaptive spatially regularized technique is presented, which aims to learn an efficient spatial weight for a particular object and fast target appearance variations. Specifically, we present a target-aware background suppression method with dual regression approach, which utilizes a saliency detection technique to produce the target mask. In this technique, we employ the global and target features to get the dual filters known as the global and target filters. Accordingly, global and target response maps are produced by dual filters, which are integrated into the detection stage to optimize the target response. In addition, a novel adaptive attribute-aware approach is presented to emphasize channel-specific discriminative features, which implements a post-processing technique on the observed spatial patterns to reduce the influence of less prominent channels. Therefore, the learned adaptive spatial attention patterns significantly reduce the irrelevant information of multi-channel features and improve the tracker performance. Finally, we demonstrate the efficiency of the ASTABSCF approach against existing modern trackers using the OTB-2013, OTB-2015, TempleColor-128, UAV-123, LaSOT, and GOT-10K benchmark datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117305"},"PeriodicalIF":3.4,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143686523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lost in light field compression: Understanding the unseen pitfalls in computer vision 迷失于光场压缩:理解计算机视觉中看不见的陷阱
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-03-10 DOI: 10.1016/j.image.2025.117304
Adam Zizien , Chiara Galdi , Karel Fliegel , Jean-Luc Dugelay
{"title":"Lost in light field compression: Understanding the unseen pitfalls in computer vision","authors":"Adam Zizien ,&nbsp;Chiara Galdi ,&nbsp;Karel Fliegel ,&nbsp;Jean-Luc Dugelay","doi":"10.1016/j.image.2025.117304","DOIUrl":"10.1016/j.image.2025.117304","url":null,"abstract":"<div><div>Could we be overlooking a fundamental aspect of light fields in our quest for efficient compression? The vast amount of data enclosed in a light field makes compression a necessity. Yet, from an application point of view, the focus is predominantly on visual consumption while light fields have properties that can potentially be used in various other tasks. This paper examines the impact of light field compression on the performance of subsequent computer vision tasks. We investigate the variations in quality across perspectives and their impact on face recognition systems and disparity estimation. By leveraging a diverse dataset of light field images, we thoroughly evaluate the performance of various face recognition algorithms when subjected to different conventional and learning-based compression techniques, such as JPEG Pleno, ALVC, and SADN-QVRF. Our findings reveal a noticeable decline in peak recognition performance as compression levels increase, given specific recognition frameworks. Furthermore, we identify a significant shift in the recognition threshold, particularly in response to higher degrees of compression. Secondly, by relying on a novel disparity estimation algorithm, we explore the loss of information across light field perspectives. Our results highlight a disconnect between the preservation of visual fidelity and the loss of minute detail crucial for the preservation of disparity information in light field images. The findings presented herein aim to contribute to the development of efficient compression strategies while emphasizing the delicate balance between compression efficiency, subjective quality, and feature preservation with the aim of increased accuracy in specialized light field systems.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117304"},"PeriodicalIF":3.4,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143620241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image quality assessment method via the fusion of visual and structural information 通过融合视觉和结构信息评估水下图像质量的方法
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-02-27 DOI: 10.1016/j.image.2025.117285
Tianhai Chen , Xichen Yang , Tianshu Wang , Nengxin Li , Shun Zhu , Genlin Ji
{"title":"Underwater image quality assessment method via the fusion of visual and structural information","authors":"Tianhai Chen ,&nbsp;Xichen Yang ,&nbsp;Tianshu Wang ,&nbsp;Nengxin Li ,&nbsp;Shun Zhu ,&nbsp;Genlin Ji","doi":"10.1016/j.image.2025.117285","DOIUrl":"10.1016/j.image.2025.117285","url":null,"abstract":"<div><div>Underwater-captured images often suffer from quality degradation due to the challenging underwater environment, leading to information loss that significantly affects their usability. Therefore, accurately predicting the quality of underwater images is crucial. To tackle this issue, this study introduces a novel Underwater Image Quality Assessment method that combines visual and structural information. First, the CIELab map, gradient feature map, and Mean Subtracted Contrast Normalized feature map of the underwater image are obtained. Then, these feature maps are divided into non-overlapping 32x32 patches, and each patch is fed into the corresponding sub-network. This method allows for a comprehensive description of the changes in visual and structural information resulting from quality degradation in underwater images. Subsequently, the features extracted by the multipath network are fused using a feature fusion network to promote feature complementarity and overcome the limitations of individual features. Finally, the relationship between underwater image quality and fusion features was learned to obtain an evaluation model. Furthermore, the quality of the underwater image can be measured by combining the quality prediction scores of different patches. Experimental results on underwater image datasets demonstrate that the proposed method can achieve more accurate and stable quality measurement results with a more lightweight structure. Meanwhile, performance comparisons on natural image datasets and screen content image datasets confirm that the proposed method is more applicable for complex application scenarios than existing methods. The code is open-source and available at <span><span>https://github.com/dart-into/UIQAVSI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117285"},"PeriodicalIF":3.4,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced transformer for high-noise image denoising: Enhanced attention and detail preservation 用于高噪声图像去噪的先进变压器:增强关注和细节保存
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-02-26 DOI: 10.1016/j.image.2025.117286
Jie Zhang , Wenxiao Huang , Miaoxin Lu , Fengxian Wang , Mingdong Zhao , Yinhua Li
{"title":"Advanced transformer for high-noise image denoising: Enhanced attention and detail preservation","authors":"Jie Zhang ,&nbsp;Wenxiao Huang ,&nbsp;Miaoxin Lu ,&nbsp;Fengxian Wang ,&nbsp;Mingdong Zhao ,&nbsp;Yinhua Li","doi":"10.1016/j.image.2025.117286","DOIUrl":"10.1016/j.image.2025.117286","url":null,"abstract":"<div><div>In image denoising, the transformer model effectively captures global dependencies within an image due to its self-attention mechanism. This capability enhances the understanding of the overall structure and details of the image during the denoising process. However, the computational complexity of global self-attention increases quadratically with higher spatial resolutions, making it unsuitable for the real-time denoising of high-resolution and high-noise images. And, the use of local windows alone neglects the long-range pixel correlations. Furthermore, the self-attention mechanism applies a global weighting to the pixels of the input image, which can lead to the smoothing or loss of fine details. To enrich structural information and alleviate the computational complexity associated with global self-attention, we propose an edge-enhanced windowed multi-head self-attention mechanism (EWMSA). This mechanism combines edge enhancement with windowed self-attention to reduce computational demands while allowing edge features to better preserve detail and texture information. To mitigate the effects of ineffective features with low weights, we introduce a feed-forward network with a gate control strategy (LGFN). This network adjusts pixel weights to prioritize attention on effective pixels, thereby enhancing their prominence. Furthermore, to compensate for the limitations of window-based self-attention in global pixel utilization, we propose a deformable convolution block (DFCB). This block improves the interaction of contextual information and allows for better adaptation to texture variations within the image. Extensive experiments demonstrate that the proposed ATHID is competitive with other state-of-the-art denoising methods when applied to real-world noise and various synthetic noise levels, effectively addressing the challenges of high-noise image denoising. The code and models are publicly available at <span><span>https://github.com/zzuli407/ATHID</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117286"},"PeriodicalIF":3.4,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visually multimodal depression assessment based on key questions with weighted multi-task learning 基于加权多任务学习关键问题的视觉多模态抑郁评估
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-02-18 DOI: 10.1016/j.image.2025.117279
Peng Wang , Miaomiao Cao , Xianlin Zhu , Suhong Wang , Rongrong Ni , Changchun Yang , Biao Yang
{"title":"Visually multimodal depression assessment based on key questions with weighted multi-task learning","authors":"Peng Wang ,&nbsp;Miaomiao Cao ,&nbsp;Xianlin Zhu ,&nbsp;Suhong Wang ,&nbsp;Rongrong Ni ,&nbsp;Changchun Yang ,&nbsp;Biao Yang","doi":"10.1016/j.image.2025.117279","DOIUrl":"10.1016/j.image.2025.117279","url":null,"abstract":"<div><div>In recent years, depression has received attention due to its high prevalence and high risk of suicide. In contrast, the increased pressure on health care and the shortage of mental health professionals have led to the failure to detect and intervene in depression promptly. To solve the above problems, we propose a visual multi-modal fusion network for depression assessment based on weighted multi-task learning (WMTL). First, the visual cues of different modalities are collected from the subjects when they answer key questions in the simulated interview to mitigate redundancy. Afterward, spatial attention-based feature embedding modules are proposed to extract depression-aware features from different visual cues. Finally, a hierarchical weighted attention fusion (HAF) module is presented to fuse the depression-aware features from different modalities and facilitate depression assessment. Comprehensive evaluations are conducted on the benchmarking DAIC-WOZ. Experimental results show that the proposed method performs well in assessing depression, with an average accuracy of 76.96% for ten questions and an F1 score of 0.85. The high performance also indicates a strong correlation between key questions in the interview and depression levels.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117279"},"PeriodicalIF":3.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting rank-based filter pruning for real-time UAV tracking 利用基于秩的滤波剪枝实现无人机实时跟踪
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-02-18 DOI: 10.1016/j.image.2025.117278
Xucheng Wang , Dan Zeng , Qijun Zhao , Shuiwang Li
{"title":"Exploiting rank-based filter pruning for real-time UAV tracking","authors":"Xucheng Wang ,&nbsp;Dan Zeng ,&nbsp;Qijun Zhao ,&nbsp;Shuiwang Li","doi":"10.1016/j.image.2025.117278","DOIUrl":"10.1016/j.image.2025.117278","url":null,"abstract":"<div><div>UAV tracking is an emerging task and has wide potential applications in such as agriculture, navigation, entertainment and public security. However, the limitations of computing resources, battery capacity, and maximum load of UAV hinder the deployment of DL-based tracking algorithms on UAV. In contrast to deep learning trackers, discriminative correlation filters (DCF)-based trackers stand out in the UAV tracking community because of their high efficiency. However, their precision is usually much lower than trackers based on deep learning. Model compression is a promising way to reduce the disparity (i.e., efficiency, precision) between DCF- and deep learning- based trackers, which has not caught much attention in the UAV tracking community. In this paper, We propose the P-SiamFC++ tracker, which is the first to use rank-based filter pruning to compress the SiamFC++ model, achieving a remarkable balance between efficiency and precision. Our method is general and could inspire additional research into UAV tracking with model compression in the future. Extensive experiments on four UAV benchmarks, including UAV123@10fps, DTB70, UAVDT and Vistrone2018, show that P-SiamFC++ tracker significantly outperforms state-of-the-art UAV tracking methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117278"},"PeriodicalIF":3.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking erasing strategy on weakly supervised object localization 再思考弱监督对象定位的擦除策略
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-02-18 DOI: 10.1016/j.image.2025.117280
Yuming Fan , Shikui Wei , Chuangchuang Tan , Xiaotong Chen , Dongming Yang , Yao Zhao
{"title":"Rethinking erasing strategy on weakly supervised object localization","authors":"Yuming Fan ,&nbsp;Shikui Wei ,&nbsp;Chuangchuang Tan ,&nbsp;Xiaotong Chen ,&nbsp;Dongming Yang ,&nbsp;Yao Zhao","doi":"10.1016/j.image.2025.117280","DOIUrl":"10.1016/j.image.2025.117280","url":null,"abstract":"<div><div>Weakly supervised object localization (WSOL) is a challenging task that aims to locate object regions in images using image-level labels as supervision. Early research utilized erasing strategy to expand the localization regions. However, those methods usually adopt a fix threshold resulting in over- or under-fitting of the object region. Additionally, recent pseudo-label paradigm decouples the classification and localization tasks, causing confusion between foreground and background regions. In this paper, we propose the Soft-Erasing (SoE) method for Weakly Supervised Object Localization (WSOL). It includes two key modules: the Adaptive Erasing (AE) and Flip Erasing (FE). The AE module dynamically adjusts the erasing threshold using the object’s structural information, while the noise information module ensures the classifier focuses on the foreground region. The FE module effectively decouples object and background information by using normalization and inversion techniques. Additionally, we introduce activation loss and reverse loss to strengthen semantic consistency in foreground regions. Experiments on public datasets demonstrate that our SoE framework significantly improves localization accuracy, achieving 70.86% on GT-Known Loc for ILSVRC and 95.84% for CUB-200-2011.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117280"},"PeriodicalIF":3.4,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DA-Net: Deep attention network for biomedical image segmentation DA-Net:用于生物医学图像分割的深度关注网络
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-02-13 DOI: 10.1016/j.image.2025.117283
Yingyan Gu, Yan Wang, Hua Ye, Xin Shu
{"title":"DA-Net: Deep attention network for biomedical image segmentation","authors":"Yingyan Gu,&nbsp;Yan Wang,&nbsp;Hua Ye,&nbsp;Xin Shu","doi":"10.1016/j.image.2025.117283","DOIUrl":"10.1016/j.image.2025.117283","url":null,"abstract":"<div><div>Deep learning-based image segmentation techniques are of great significance to biomedical image analysis and clinical disease diagnosis, among which U-Net is one of the classic biomedical image segmentation algorithms and is widely used in the field of biomedicine. In this paper, we propose an improved triplet attention module and embed it into the U-Net framework to form a novel deep attention network, called DA-Net, for biomedical image segmentation. Specifically, an additional layer is stacked into the original U-Net, resulting in a six-layer U-shaped network. Then, the double convolution module of the U-Net is replaced with a composite block which consists of the improved triplet attention module and the residual concatenate block, to obtain abundant valuable features effectively. We redesign the network structure to increase its width and depth and train our model with the pixel position aware loss, realizing the synchronous increase of the mean IoU value and average Dice index. Extensive experiments have been carried out on two publicly available biomedical datasets, including the 2018 Data Science Bowl (DSB) and the international skin imaging collaboration (ISIC) 2018 Challenge, and a self-built fetal cerebellar ultrasound dataset from Affiliated Hospital of Jiangsu University, named JSUAH<img>Cerebellum. The mIoU and mDice of DA-Net can reach 87.45 % and 92.98 % on the JSUAH<img>Cerebellum, 87.36 % and 91.37 % on the 2018 Data Science Bowl, and 86.75 % and 91.34 % on the ISIC-2018 Challenge, respectively. Experimental results demonstrate that our DA-Net achieves promising performance in terms of segmentation robustness and generalization ability.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117283"},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-reference image quality assessment based on improved vision transformer and transfer learning 基于改进视觉变换和迁移学习的无参考图像质量评估
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-02-11 DOI: 10.1016/j.image.2025.117282
Bo Zhang , Luoxi Wang , Cheng Zhang , Ran Zhao , Jinlu Sun
{"title":"No-reference image quality assessment based on improved vision transformer and transfer learning","authors":"Bo Zhang ,&nbsp;Luoxi Wang ,&nbsp;Cheng Zhang ,&nbsp;Ran Zhao ,&nbsp;Jinlu Sun","doi":"10.1016/j.image.2025.117282","DOIUrl":"10.1016/j.image.2025.117282","url":null,"abstract":"<div><div>To improve the accuracy and generalization performance of the existing no-reference image quality assessment models on small datasets, a no-reference image quality assessment model based on an improved vision transformer model and transfer learning is proposed. Firstly, ResNet is employed as a feature extraction network to obtain basic perceptual features from the input images, and a Convolutional Block Attention Module is introduced to further improve the network's feature extraction capabilities. Secondly, the Transformer Encoder is utilized to regress multi-layer features, improving the network's ability to capture global image information and predict scores. Lastly, to overcome the performance limitations of the Transformer model on small datasets, a transfer learning method is used to solve the dilemma of the relatively small capacity of the databases for image quality assessment. The model is trained and tested on three small-scale datasets and compared with seven mainstream algorithms. Performance is analyzed across three dimensions using statistical significance tests. The results show that, while the model does not perform best in distinguishing between similar and significantly different pairs, it still demonstrates competitive capabilities. Additionally, it performs exceptionally well in assessing quality differences and evaluating Area Under Curve, highlighting its strong potential for practical applications.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"135 ","pages":"Article 117282"},"PeriodicalIF":3.4,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143427755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信