Signal Processing-Image Communication最新文献_第4页

CT-PCQA: A Convolutional Neural Network and Transformer combined Method for Point Cloud Quality Assessment CT-PCQA：一种卷积神经网络和变压器相结合的点云质量评估方法

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-19 DOI: 10.1016/j.image.2025.117371

Yingjie Zhou , Zicheng Zhang , Wei Sun , Xiongkuo Min , Guangtao Zhai

{"title":"CT-PCQA: A Convolutional Neural Network and Transformer combined Method for Point Cloud Quality Assessment","authors":"Yingjie Zhou , Zicheng Zhang , Wei Sun , Xiongkuo Min , Guangtao Zhai","doi":"10.1016/j.image.2025.117371","DOIUrl":"10.1016/j.image.2025.117371","url":null,"abstract":"<div><div>Point clouds serve as a prevalent means of 3D content representation, finding wide applications across multiple fields. However, their extensive and intricate data often encounter various distortions due to limitations in storage and bandwidth. Notably, compression and simplification, commonly employed during point cloud transmission, significantly damage their quality. To address this challenge, the development of effective methodologies for quantifying distortion in point clouds becomes imperative. In this paper, we propose a novel approach for <u><strong>P</strong></u>oint <u><strong>C</strong></u>loud <u><strong>Q</strong></u>uality <u><strong>A</strong></u>ssessment (PCQA) named as CT-PCQA by combining <u><strong>C</strong></u>onvolutional Neural Network (CNN) and <u><strong>T</strong></u>ransformer methods. Our method involves generating multi-projections through a cube-like projection process, catering to both full-reference (FR) and no-reference (NR) PCQA tasks. We leverage the strengths of CNN and Transformer by extracting quality-aware features using popular vision backbones. For FR quality representation, we compute the similarity between the feature maps of reference and distorted projections. For NR quality representation, we simply employ average pooling on the feature maps of distorted projections. Subsequently, these quality representations are regressed into visual quality scores using fully-connected (FC) layers. Our participation in the ICIP 2023 PCVQA Grand Challenge yielded significant results, securing the top spot in four out of the five competition tracks. Furthermore, experimental results demonstrate that the proposed method achieves state-of-the-art performance across various databases. The related code will be released at <span><span>https://github.com/zyj-2000/CT-PCQA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117371"},"PeriodicalIF":3.4,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding, detecting, and removing perceptual banding artifacts in compressed videos 理解，检测和去除压缩视频中的感知带伪影

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-18 DOI: 10.1016/j.image.2025.117372

Zhengzhong Tu , Chia-Ju Chen , Jessie Lin , Yilin Wang , Neil Birkbeck , Balu Adsumilli , Alan C. Bovik

{"title":"Understanding, detecting, and removing perceptual banding artifacts in compressed videos","authors":"Zhengzhong Tu , Chia-Ju Chen , Jessie Lin , Yilin Wang , Neil Birkbeck , Balu Adsumilli , Alan C. Bovik","doi":"10.1016/j.image.2025.117372","DOIUrl":"10.1016/j.image.2025.117372","url":null,"abstract":"<div><div>Banding artifacts, or false contouring, are a common compression impairment that often appears on large smooth regions of encoded videos and images. These staircase-like color bands can be very noticeable and annoying, even on otherwise high-quality videos, especially when displayed on high-definition screens. Yet, relatively little attention has been applied to this problem. Here we study this artifact, by first analyzing the perceptual and encoding aspects of banding artifacts, then propose a new distortion-specific no-reference video quality algorithm for predicting banding artifacts, inspired by perceptual models. The proposed banding detector can generate a pixel-wise banding visibility map, and output overall banding severity scores at both the frame and video levels. Furthermore, we propose a deep learning based approach to improve the overall perceptual quality of compressed videos by joint debanding and compression artifact removal. Our experimental results show that the proposed banding detector delivers better consistency with subjective evaluations, and is able to detect different perceptual severity levels of bands. The debanding experiments also show that the proposed algorithm outperforms recent debanding models both visually and quantitatively. The code is available at <span><span>https://github.com/google/bband-adaband</span><svg><path></path></svg></span> and <span><span>https://github.com/vztu/DebandingNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117372"},"PeriodicalIF":3.4,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Activity recognition using blending template for dynamic time warping 活动识别使用混合模板进行动态时间规整

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-17 DOI: 10.1016/j.image.2025.117355

Damon Shing-Min Liu, Yun-Ya Gao

{"title":"Activity recognition using blending template for dynamic time warping","authors":"Damon Shing-Min Liu, Yun-Ya Gao","doi":"10.1016/j.image.2025.117355","DOIUrl":"10.1016/j.image.2025.117355","url":null,"abstract":"<div><div>Sequence matching is a common recognition method. In it, Dynamic Time Warping (DTW) is the most widely used one. DTW usually applies to speech recognition, data mining, handwriting recognition, patterns finding and image registration. It also has been applied to wearable device recognition in recent years. Here our action recognition research uses dataset of wearable devices that include accelerometers and gyroscopes. DTW needs to select a representative sequence as the template, and compare the target sequence with the template sequence. Therefore, the quality of the template will affect the recognition rate, and how to choose the template will be a challenge. This paper proposes an effective feature combination according to activity recognition based on dynamic time warping using wearable devices. This combination is suitable for recognizing individual, i.e., user-dependent, activities. This paper also proposes a set of procedures to get a user-independent template. Best individual templates of all subjects are grouped by fixed distance then each group is blended into a universal template. Our experiments all used public datasets. The accuracy of blending templates is higher than that of the single template, and the recognition time of blending templates is less than that of the multiple templates.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117355"},"PeriodicalIF":3.4,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward a blind quality assessment for underwater images 水下图像质量盲评价方法研究

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-17 DOI: 10.1016/j.image.2025.117370

Siqi Zhang , Guojia Hou , Kunqian Li , Weidong Zhang , Huan Yang , Zhenkuan Pan

{"title":"Toward a blind quality assessment for underwater images","authors":"Siqi Zhang , Guojia Hou , Kunqian Li , Weidong Zhang , Huan Yang , Zhenkuan Pan","doi":"10.1016/j.image.2025.117370","DOIUrl":"10.1016/j.image.2025.117370","url":null,"abstract":"<div><div>Underwater images quality assessment (IQA) plays a vital role for some image-based applications. However, existing specific no-reference underwater IQA metrics mainly concentrate on considering visual quality-related features, such as colorfulness, sharpness and contrast, which are insufficient to characterize the image quality comprehensively, resulting in an unsatisfactory prediction performance and limited generalization capability. In this paper, we present a new blind multiple features evaluation metric (MFEM) by extracting five types of features to quantify underwater image quality, including color, sharpness, luminance, structure and texture. Specifically, we combine the colorfulness, color difference and color saturation to represent color features. The sharpness feature is generated by incorporating a saliency map into the sharpness measure. Moreover, based on the natural scene statistics (NSS) regularity, we utilize the NSS features acquired from the illuminance map to characterize the luminance change of distorted underwater image. In addition, the structure and texture features are calculated by employing the gradient-based local binary pattern operator and gray-level co-occurrence matrix, respectively. After that, the Gaussian process regression is exploited for training the prediction model from the extracted features to subjective opinion score. Also, to verify the generalization ability of the existing IQA metrics, we establish a real-world underwater IQA dataset with subjective scores. Extensive experiments conducted on public benchmark datasets and our constructed dataset both demonstrate that our proposed MFEM achieves better prediction performance comparing with several state-of-the-art IQA metrics. The code and dataset are available at: <span><span>https://github.com/Hou-Guojia/MFEM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117370"},"PeriodicalIF":3.4,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144331450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A transformer-based siamese network using a self-attention mechanism for change detection in remote sensing data 一种基于变压器的暹罗网络，使用自关注机制来检测遥感数据的变化

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-17 DOI: 10.1016/j.image.2025.117379

Roohollah Enayati, Reza Ravanmehr, Vahe Aghazarian

{"title":"A transformer-based siamese network using a self-attention mechanism for change detection in remote sensing data","authors":"Roohollah Enayati, Reza Ravanmehr, Vahe Aghazarian","doi":"10.1016/j.image.2025.117379","DOIUrl":"10.1016/j.image.2025.117379","url":null,"abstract":"<div><div>Efficient change detection in remote sensing data remains a critical challenge, necessitating a comprehensive solution that addresses diverse data formats and applications. Existing methods often specialize in specific complications, lacking a unified approach for various data types and resolutions. In this study, we propose an innovative methodology that integrates a Siamese network with a Transformer model using a self-attention mechanism, providing a versatile solution for change detection. Our approach introduces a Pairwise Learning Task and Feature Fusion techniques, leveraging Convolutional and Transformer Layers to enhance precision. The Siamese network, developed for similarity determination, is augmented with self-attention mechanisms, enabling the capture of intricate feature relationships across sequences. Our method not only excels in binary change mapping but also in feature type identification, adding valuable insights to the remote sensing domain. Experimental results on Sentinel-2, QuickBird, and TerraSAR-X datasets demonstrate that our approach achieves overall accuracies of 99.23 %, 98.96 %, and 99.08 %, respectively, and F1-scores of 92.16 %, 95.74 %, and 96.21 %<strong>,</strong> significantly outperforming existing state-of-the-art methods by margins of up to 4.3 % in accuracy and 7.7 % in F1-score. These results highlight the proposed method’s adaptability, precision, and robustness. The methodology, focusing on efficiency and accuracy, presents a significant advancement in remote sensing change detection, with promising applications in environmental monitoring, urban planning, and disaster management.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117379"},"PeriodicalIF":3.4,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image blind denoising using dual convolutional neural network with skip connection 基于跳跃连接的双卷积神经网络图像盲去噪

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-17 DOI: 10.1016/j.image.2025.117365

Wencong Wu, Shicheng Liao, Guannan Lv, Peng Liang, Yungang Zhang

{"title":"Image blind denoising using dual convolutional neural network with skip connection","authors":"Wencong Wu, Shicheng Liao, Guannan Lv, Peng Liang, Yungang Zhang","doi":"10.1016/j.image.2025.117365","DOIUrl":"10.1016/j.image.2025.117365","url":null,"abstract":"<div><div>In recent years, deep convolutional neural networks have shown fascinating performance in the field of image denoising. However, deeper network architectures are often accompanied with large numbers of model parameters, leading to high training cost and long inference time, which limits their application in practical denoising tasks. In this paper, we propose a novel dual convolutional denoising network with skip connection (DCBDNet) for image blind denoising, which is able to achieve a desirable balance between the denoising effect and network complexity. The proposed DCBDNet consists of a noise estimation network and a dual convolutional neural network (CNN). The noise estimation network is used to estimate the noise level map, which improves the flexibility of the proposed model. The dual CNN contains two branches: a u-shaped sub-network is designed for the upper branch, and the lower branch is composed of the dilated convolution layers. Skip connections between layers are utilized in both the upper and lower branches. The proposed DCBDNet was evaluated on several synthetic and real-world image denoising benchmark datasets. Experimental results have demonstrated that the proposed DCBDNet can effectively remove Gaussian noise, spatially variant noise and real noise. With a simple model structure, our proposed DCBDNet still can obtain competitive denoising performance compared to the state-of-the-art image denoising models containing complex architectures. Namely, a favorable trade-off between denoising performance and model complexity is achieved.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117365"},"PeriodicalIF":3.4,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144322396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CASK-Net fusion: Multi branch approach for cross-age sketch face recognition CASK-Net融合：跨年龄素描人脸识别的多分支方法

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-12 DOI: 10.1016/j.image.2025.117369

Ipsita Pattnaik , Amita Dev , A.K. Mohapatra

{"title":"CASK-Net fusion: Multi branch approach for cross-age sketch face recognition","authors":"Ipsita Pattnaik , Amita Dev , A.K. Mohapatra","doi":"10.1016/j.image.2025.117369","DOIUrl":"10.1016/j.image.2025.117369","url":null,"abstract":"<div><div>Cross-Age Sketch Face Recognition targets the collective problem of Cross-Age Face Recognition (FR) and Sketch Face Recognition. Existing works discuss these problems individually, but no attempts towards collective version of these problems have been observed. In real life law enforcement, criminal and forensic investigations; the age and facial appearance of a subject may be different at sketch generation time and recognition time (present day). We therefore address this issue and propose a CASK-Net fusion approach to solve the collective problem of Cross-Age FR and Sketch FR. This paper presents a novel CASK-Net fusion architecture to capture discriminative features using multiple feature extractor branches including HOG, SIFT, CNN, LBP, ORB and Inception ResNetV2 (SOTA) respectively. The proposed approach grounds on extraction of age invariant features from sketch images of an individual for effective recognition. Our approach eliminates the requirement of modality conversion (sketch-photo) for recognition and provides less complex (transformation complexity is eliminated) solution. We also propose a benchmark Cross-Age Sketch (CASK) dataset to serve as a standard towards collective problem of Cross-Age FR and Sketch FR. The quantitative and ablation results highlight 95.52 % AUC-ROC performance and the fusion model achieved 93.37 % training accuracy (last epoch). Moreover, the SOTA comparison and dataset analysis confirms the model superiority with validation accuracy of 60.89 % on challenging and intrinsically hard CASK dataset.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117369"},"PeriodicalIF":3.4,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144271276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis discriminative convolutional graph dictionary learning for generalized signal classification 判别卷积图字典学习在广义信号分类中的应用

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-10 DOI: 10.1016/j.image.2025.117356

Yuehan Xiong, Xin Li, Wenrui Dai, Hongkai Xiong

{"title":"Analysis discriminative convolutional graph dictionary learning for generalized signal classification","authors":"Yuehan Xiong, Xin Li, Wenrui Dai, Hongkai Xiong","doi":"10.1016/j.image.2025.117356","DOIUrl":"10.1016/j.image.2025.117356","url":null,"abstract":"<div><div>Analysis discriminative dictionary learning (ADDL) techniques have been studied for addressing image classification problems. However, existing ADDL methods ignore the structural dependency within the signals and cannot fit the general class of signals with irregular structures, including spherical images and 3D objects. In this paper, we propose a novel analysis discriminative convolutional graph dictionary learning method that fully exploits the structural dependency for signal classification, especially for irregular graph signals. The proposed method integrates the graph embedding information to analysis convolutional dictionary learning to derive a set of class-specific convolutional graph sub-dictionaries for extracting consistent class-specific features. An analytical decorrelation term is introduced as regularization to constrain the linear classifier for each class and improve the discrimination ability of dictionary-based sparse representation. Furthermore, we develop an efficient alternating update algorithm to solve the formulated non-convex minimization problem that simultaneously achieves sparse representation using ISTA and optimizes the convolutional graph dictionary and classifiers in an analytic manner. To our best knowledge, this paper is the first attempt to achieve analysis dictionary learning for generalized classification of signals with regular and irregular structures. Experimental results show that the proposed method outperforms state-of-the-art discriminative dictionary learning methods by 0.26% to 2.68% in classification accuracy for both regular and irregular signal classification. Notably, it is comparable to recent deep learning models with up to about 1% accuracy loss in irregular signal classification.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117356"},"PeriodicalIF":3.4,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144307657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust emotion recognition in thermal imaging with convolutional neural networks and grey wolf optimization 基于卷积神经网络和灰狼优化的热成像鲁棒情绪识别

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-06 DOI: 10.1016/j.image.2025.117363

Anselme Atchogou , Cengiz Tepe

{"title":"Robust emotion recognition in thermal imaging with convolutional neural networks and grey wolf optimization","authors":"Anselme Atchogou , Cengiz Tepe","doi":"10.1016/j.image.2025.117363","DOIUrl":"10.1016/j.image.2025.117363","url":null,"abstract":"<div><div>Facial Expression Recognition (FER) is a pivotal technology in human-computer interaction, with applications spanning psychology, virtual reality, and advanced driver assistance systems. Traditional FER using visible light cameras faces challenges in low light conditions, shadows, and reflections. This study explores thermal imaging as an alternative, leveraging its ability to capture heat radiation and overcome lighting issues. We propose a novel approach that combines pre-trained models, particularly EfficientNet variants, with Grey Wolf Optimization (GWO) and various classifiers for robust emotion recognition. Ten pre-trained CNN models, including variants of EfficientNet (EfficientNet-B0, B3, B4, B7, V2L, V2M, V2S), ResNet50, MobileNet, and InceptionResNetV2, are utilized to extract features from thermal images. GWO is employed to optimize the parameters of four classifiers: Support Vector Machine (SVM), Random Forest, Gradient Boosting, and k-Nearest Neighbors (kNN). Two popular thermal image datasets, IRDatabase and KTFE, are used to assess the suggested methodology. Combining EfficientNet-B7 with GWO and kNN or SVM for eight distinct emotions (fear, anger, contempt, disgust, happiness, neutrality, sadness, and surprise) yielded the highest accuracy of 91.42 % on the IRDatabase dataset. Combining EfficientNet-B7 with GWO and Gradient Boosting for seven distinct emotions (anger, disgust, fear, happiness, neutrality, sadness, and surprise) yielded the highest accuracy of 99.48 % on the KTFE dataset. These results demonstrate the effectiveness and reliability of the proposed approach for emotion identification in thermal images, making it a viable way to overcome the drawbacks of conventional visible-light-based FER systems.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117363"},"PeriodicalIF":3.4,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144308067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SJND: A Spherical Just Noticeable Difference Modelling for 360° video coding SJND：用于360°视频编码的球面可注意差分建模

IF 3.4 3区工程技术

Signal Processing-Image Communication Pub Date : 2025-06-06 DOI: 10.1016/j.image.2025.117354

Liqun Lin , Yanting Wang , Jiaqi Liu , Hongan Wei , Bo Chen , Weiling Chen , Tiesong Zhao

{"title":"SJND: A Spherical Just Noticeable Difference Modelling for 360° video coding","authors":"Liqun Lin , Yanting Wang , Jiaqi Liu , Hongan Wei , Bo Chen , Weiling Chen , Tiesong Zhao","doi":"10.1016/j.image.2025.117354","DOIUrl":"10.1016/j.image.2025.117354","url":null,"abstract":"<div><div>The popularity of 360° video is due to its realistic and immersive experience, but the higher resolution poses challenges for data transmission and storage. Existing compression schemes for 360° videos mainly focus on spatial and temporal redundancy elimination, neglecting the removal of visual perception redundancy. To address this issue, we exploit the visual characteristics of 360° equirectangular projection to extend the popular Just Noticeable Difference model to Spherical Just Noticeable Difference. Our modeling takes advantage of the following factors: regional masking factor, which employs an entropy-based region classification and separately characterizes contrast masking effects on different regions; latitude projection characteristics, which model the impact of pixel-level warping during equirectangular projection mapping; field of view attention factor, which reflects the attention variation of the human visual system on 360° display. Subjective tests show that our Spherical Just Noticeable Difference model is consistent with user perceptions and also has a higher tolerance of distortions with reduced bit rates of 360° pictures. Further experiments on Versatile Video Coding also demonstrate that the introduction of the proposed model significantly reduces bit rates with negligible loss in perceived visual quality.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117354"},"PeriodicalIF":3.4,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144240531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0