Luminance decomposition and Transformer based no-reference tone-mapped image quality assessment

IF 3.4 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2024-11-14 DOI:10.1016/j.displa.2024.102881

Zikang Chen , Zhouyan He , Ting Luo , Chongchong Jin , Yang Song

{"title":"Luminance decomposition and Transformer based no-reference tone-mapped image quality assessment","authors":"Zikang Chen , Zhouyan He , Ting Luo , Chongchong Jin , Yang Song","doi":"10.1016/j.displa.2024.102881","DOIUrl":null,"url":null,"abstract":"<div><div>Tone-Mapping Operators (TMOs) play a crucial role in converting High Dynamic Range (HDR) images into Tone-Mapped Images (TMIs) with standard dynamic range for optimal display on standard monitors. Nevertheless, TMIs generated by distinct TMOs may exhibit diverse visual artifacts, highlighting the significance of TMI Quality Assessment (TMIQA) methods in predicting perceptual quality and guiding advancements in TMOs. Inspired by luminance decomposition and Transformer, a new no-reference TMIQA method based on deep learning is proposed in this paper, named LDT-TMIQA. Specifically, a TMI will change under the influence of different TMOs, potentially resulting in either over-exposure or under-exposure, leading to structure distortion and changes in texture details. Therefore, we first decompose the luminance channel of a TMI into a base layer and a detail layer that capture structure information and texture information, respectively. Then, they are employed with the TMI collectively as inputs to the Feature Extraction Module (FEM) to enhance the availability of prior information on luminance, structure, and texture. Additionally, the FEM incorporates the Cross Attention Prior Module (CAPM) to model the interdependencies among the base layer, detail layer, and TMI while employing the Iterative Attention Prior Module (IAPM) to extract multi-scale and multi-level visual features. Finally, a Feature Selection Fusion Module (FSFM) is proposed to obtain final effective features for predicting the quality scores of TMIs by reducing the weight of unnecessary features and fusing the features of different levels with equal importance. Extensive experiments on the publicly available TMI benchmark database indicate that the proposed LDT-TMIQA reaches the state-of-the-art level.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102881"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224002452","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Tone-Mapping Operators (TMOs) play a crucial role in converting High Dynamic Range (HDR) images into Tone-Mapped Images (TMIs) with standard dynamic range for optimal display on standard monitors. Nevertheless, TMIs generated by distinct TMOs may exhibit diverse visual artifacts, highlighting the significance of TMI Quality Assessment (TMIQA) methods in predicting perceptual quality and guiding advancements in TMOs. Inspired by luminance decomposition and Transformer, a new no-reference TMIQA method based on deep learning is proposed in this paper, named LDT-TMIQA. Specifically, a TMI will change under the influence of different TMOs, potentially resulting in either over-exposure or under-exposure, leading to structure distortion and changes in texture details. Therefore, we first decompose the luminance channel of a TMI into a base layer and a detail layer that capture structure information and texture information, respectively. Then, they are employed with the TMI collectively as inputs to the Feature Extraction Module (FEM) to enhance the availability of prior information on luminance, structure, and texture. Additionally, the FEM incorporates the Cross Attention Prior Module (CAPM) to model the interdependencies among the base layer, detail layer, and TMI while employing the Iterative Attention Prior Module (IAPM) to extract multi-scale and multi-level visual features. Finally, a Feature Selection Fusion Module (FSFM) is proposed to obtain final effective features for predicting the quality scores of TMIs by reducing the weight of unnecessary features and fusing the features of different levels with equal importance. Extensive experiments on the publicly available TMI benchmark database indicate that the proposed LDT-TMIQA reaches the state-of-the-art level.

查看原文本刊更多论文

基于亮度分解和变换器的无参考色调映射图像质量评估

阶调映射操作器（TMO）在将高动态范围（HDR）图像转换为具有标准动态范围的阶调映射图像（TMI）以在标准显示器上实现最佳显示效果方面发挥着至关重要的作用。然而，由不同 TMO 生成的 TMI 可能会表现出不同的视觉效果，这就凸显了 TMI 质量评估（TMIQA）方法在预测感知质量和指导 TMO 技术进步方面的重要性。受亮度分解和变换器的启发，本文提出了一种基于深度学习的全新无参照 TMIQA 方法，命名为 LDT-TMIQA。具体来说，TMI 在不同 TMO 的影响下会发生变化，可能导致曝光过度或曝光不足，从而导致结构失真和纹理细节的变化。因此，我们首先将 TMI 的亮度通道分解为基础层和细节层，分别捕捉结构信息和纹理信息。然后，将它们与 TMI 一起作为特征提取模块（FEM）的输入，以提高亮度、结构和纹理先验信息的可用性。此外，FEM 还结合了交叉注意先验模块 (CAPM)，以模拟基础层、细节层和 TMI 之间的相互依存关系，同时采用迭代注意先验模块 (IAPM) 来提取多尺度和多层次的视觉特征。最后，提出了一个特征选择融合模块（FSFM），通过减少不必要特征的权重和融合不同层次的同等重要特征，获得预测 TMI 质量得分的最终有效特征。在公开的 TMI 基准数据库上进行的大量实验表明，所提出的 LDT-TMIQA 达到了最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.