Deyang Liu , Lifei Wan , Xiaolin Zhang , Xiaofei Zhou , Caifeng Shan
{"title":"Global and local collaborative learning for no-reference omnidirectional image quality assessment","authors":"Deyang Liu , Lifei Wan , Xiaolin Zhang , Xiaofei Zhou , Caifeng Shan","doi":"10.1016/j.image.2025.117409","DOIUrl":"10.1016/j.image.2025.117409","url":null,"abstract":"<div><div>Omnidirectional image (OI) has achieved tremendous success in virtual reality applications. With the continuous increase in network bandwidth, users can access massive OIs from the internet. It is crucial to evaluate the visual quality of distorted OIs to ensure a high-quality immersive experience for users. For most existing viewport based OI quality assessment(OIQA) methods, the inconsistent distortions in each viewport are always overlooked. Moreover, the loss of texture details brought by viewport downsampling procedure also limits the assessment performance. In order to address these challenges, this paper proposes a global-and-local collaborative learning method for no-reference OIQA. We adopt a dual-level learning architecture to collaboratively explore the non-uniform distortions and learn a sparse representation of each projected viewport. Specifically, we extract the hierarchical features from each viewport to align with the hierarchical perceptual progress of the human visual system (HVS). By aggregating with a Transformer encoder, the inconsistent spatial features in each viewport can be globally mined. To preserve more texture details during viewport downsampling process, we introduce a learnable patch selection paradigm. By learning the position preferences of local texture variations in each viewport, our method can derive a set of sparse image patches to sparsely represent the downsampled viewport. Comprehensive experiments illustrate the superiority of the proposed method on three publicly available databases. The code is available at <span><span>https://github.com/ldyorchid/GLCNet-OIQA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117409"},"PeriodicalIF":2.7,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Jin , Huaxiang Zhang , Lei Zhu , Jiande Sun , Li Liu
{"title":"Video and text semantic center alignment for text-video cross-modal retrieval","authors":"Ming Jin , Huaxiang Zhang , Lei Zhu , Jiande Sun , Li Liu","doi":"10.1016/j.image.2025.117413","DOIUrl":"10.1016/j.image.2025.117413","url":null,"abstract":"<div><div>With the proliferation of video on the Internet, users demand higher precision and efficiency of retrieval technology. The current cross-modal retrieval technology mainly has the following problems: firstly, there is no effective alignment of the same semantic objects between video and text. Secondly, the existing neural networks destroy the spatial features of the video when establishing the temporal features of the video. Finally, the extraction and processing of the text’s local features are too complex, which increases the network complexity. To address the existing problems, we proposed a text-video semantic center alignment network. First, a semantic center alignment module was constructed to promote the alignment of semantic features of the same object across different modalities. Second, a pre-trained BERT based on a residual structure was designed to protect spatial information when inferring temporal information. Finally, the “jieba” library was employed to extract the local key information of the text, thereby simplifying the complexity of local feature extraction. The effectiveness of the network structure was evaluated on the MSVD, MSR-VTT, and DiDeMo datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117413"},"PeriodicalIF":2.7,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrated multi-channel approach for speckle noise reduction in SAR imagery using gradient, spatial, and frequency analysis","authors":"Anirban Saha, Harshit Singh, Suman Kumar Maji","doi":"10.1016/j.image.2025.117406","DOIUrl":"10.1016/j.image.2025.117406","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) imagery is inherently marred by speckle noise, which undermines image quality and complicates subsequent analytical endeavors. While numerous strategies have been suggested in existing literature to mitigate this unwanted noise, the challenge of eliminating speckle while conserving subtle structural and textural details inherent in the raw data remains unresolved. In this article, we propose a comprehensive approach combining multi-domain analysis with gradient information processing for SAR. Our method aims to effectively suppress speckle noise while retaining crucial image characteristics. By leveraging multi-domain analysis techniques, we exploit both spatial and frequency domain information to gain a deeper insight into image structures. Additionally, we introduce a novel gradient information processing step that utilizes local gradient attributes to guide the process. Experimental results obtained from synthetic and real SAR imagery illustrate the effectiveness of our approach in terms of speckle noise reduction and preservation of image features. Quantitative assessments demonstrate substantial enhancements in image quality, indicating superior performance compared to current state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117406"},"PeriodicalIF":2.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akhil Kumar , R. Dhanalakshmi , R. Rajesh , R. Sendhil
{"title":"A spatial features and weight adjusted loss infused Tiny YOLO for shadow detection","authors":"Akhil Kumar , R. Dhanalakshmi , R. Rajesh , R. Sendhil","doi":"10.1016/j.image.2025.117408","DOIUrl":"10.1016/j.image.2025.117408","url":null,"abstract":"<div><div>Shadow detection in computer vision is challenging due to the difficulty in distinguishing shadows from similarly colored or dark objects. Variations in lighting, background textures, and object shapes further complicate accurate detection. This work introduces NS-YOLO, a novel Tiny YOLO variant designed for the specific task of shadow detection under varying conditions. The new architecture includes a small-scale feature extraction network improvised by global attention mechanism, multi-scale spatial attention, and a spatial pyramid pooling block, while preserving effective multi-scale contextual information. In addition, a weight-adjusted CIOU loss function is introduced for enhancing localization accuracy. The proposed architecture addresses shadow detection by effectively capturing both fine details and global context, helping distinguish shadows from similar dark regions. The enhanced loss function improves boundary localization, reducing false detections and improving accuracy. The NS-YOLO is trained end-to-end from scratch on the SBU and ISTD datasets. The experiments show that NS-YOLO achieves a detection accuracy (mAP) of 59.2 % while utilizing only 35.6 BFLOPs. In comparison with existing lightweight YOLO variants that is, Tiny YOLO and YOLO Nano models proposed between 2017–2025, NS-YOLO shows a relative mAP improvement of 2.5 - 50.1 %. These results highlight its efficiency and effectiveness and make it particularly suitable for deployment on resource-limited edge devices in real-time scenarios, e.g., video surveillance and advanced driver-assistance systems (ADAS).</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117408"},"PeriodicalIF":2.7,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RotCLIP: Tuning CLIP with visual adapter and textual prompts for rotation robust remote sensing image classification","authors":"Tiecheng Song, Qi Liu, Anyong Qin, Yin Liu","doi":"10.1016/j.image.2025.117407","DOIUrl":"10.1016/j.image.2025.117407","url":null,"abstract":"<div><div>In recent years, Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success in a range of visual tasks by aligning visual and textual features. However, it remains a challenge to improve the robustness of CLIP for rotated images, especially for remote sensing images (RSIs) where objects can present various orientations. In this paper, we propose a Rotation Robust CLIP model, termed RotCLIP, to achieve the rotation robust classification of RSIs with a visual adapter and dual textual prompts. Specifically, we first compute the original and rotated visual features through the image encoder of CLIP and the proposed Rotation Adapter (Rot-Adapter). Then, we explore dual textual prompts to compute the textual features which describe original and rotated visual features through the text encoder of CLIP. Based on this, we further build a rotation robust loss to limit the distance of the two visual features. Finally, by taking advantage of the powerful image-text alignment ability of CLIP, we build a global discriminative classification loss by combining the prediction results of both original and rotated image-text features. To verify the effect of our RotCLIP, we conduct experiments on three RSI datasets, including the EuroSAT dataset used for scene classification, and the NWPU-VHR-10 and RSOD datasets used for object classification. Experimental results show that the proposed RotCLIP improves the robustness of CLIP against image rotation, outperforming several state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117407"},"PeriodicalIF":2.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145100067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A robust JPEG quantization step estimation method for image forensics","authors":"Chothmal Kumawat , Vinod Pankajakshan","doi":"10.1016/j.image.2025.117402","DOIUrl":"10.1016/j.image.2025.117402","url":null,"abstract":"<div><div>Estimating JPEG quantization step size from a JPEG image stored in a lossless format after the decompression (D-JPEG image) is a challenging problem in image forensics. The presence of forgery or additive noise in the D-JPEG image makes the quantization step estimation even more difficult. This paper proposes a novel quantization step estimation method robust to noise addition and forgery. First, we propose a statistical model for the subband DCT coefficients of forged and noisy D-JPEG images. We then show that the periodicity in the difference between the absolute values of rounded DCT coefficients in a subband of a D-JPEG image and those of the corresponding never-compressed image can be used for reliably estimating the JPEG quantization step. The proposed quantization step estimation method is based on this observation. Detailed experimental results reported in this paper demonstrate the robustness of the proposed method against noise addition and forgery. The experimental results also demonstrate that the quantization steps estimated using the proposed method can be used for localizing forgeries in D-JPEG images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"140 ","pages":"Article 117402"},"PeriodicalIF":2.7,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145120363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhinau K. Venkataramanan , Cosmin Stejerean , Ioannis Katsavounidis , Hassene Tmar , Alan C. Bovik
{"title":"Cut-FUNQUE: An objective quality model for compressed tone-mapped High Dynamic Range videos","authors":"Abhinau K. Venkataramanan , Cosmin Stejerean , Ioannis Katsavounidis , Hassene Tmar , Alan C. Bovik","doi":"10.1016/j.image.2025.117405","DOIUrl":"10.1016/j.image.2025.117405","url":null,"abstract":"<div><div>High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As a result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-mapping involves automating decisions regarding the choices of tone-mapping operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117405"},"PeriodicalIF":2.7,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Redundant contextual feature suppression for pedestrian detection in dense scenes","authors":"Jun Wang, Lei Wan, Xin Zhang, Xiaotian Cao","doi":"10.1016/j.image.2025.117403","DOIUrl":"10.1016/j.image.2025.117403","url":null,"abstract":"<div><div>Pedestrian detection is one of the important branches of object detection, with a wide range of applications in autonomous driving, intelligent video surveillance, and passenger flow statistics. However, these scenes exhibit high pedestrian density, severe occlusion, and complex redundant contextual information, leading to issues such as low detection accuracy and a high number of false positives in current general object detectors when applied in dense pedestrian scenes. In this paper, we propose an improved Context Suppressed R-CNN method for pedestrian detection in dense scenes, based on the Sparse R-CNN. Firstly, to further enhance the network’s ability to extract deep features in dense scenes, we introduce the CoT-FPN backbone by combining the FPN network with the Contextual Transformer Block. This block replaces the <span><math><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></math></span> convolution in the ResNet backbone. Secondly, addressing the issue that redundant contextual features of instance objects can mislead the localization and recognition of object detection tasks in dense scenes, we propose a Redundant Contextual Feature Suppression Module (RCFSM). This module, based on the convolutional block attention mechanism, aims to suppress redundant contextual information in instance features, thereby improving the network’s detection performance in dense scenes. The test results on the CrowdHuman dataset show that, compared with the Sparse R-CNN algorithm, the proposed algorithm improves the Average Precision (AP) by 1.1% and the Jaccard index by 1.2%, while also reducing the number of model parameters. Code is available at <span><span>https://github.com/davidsmithwj/CS-CS-RCNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117403"},"PeriodicalIF":2.7,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active contour model based on pre- additive bias field fitting image","authors":"Yang Chen, Guirong Weng","doi":"10.1016/j.image.2025.117404","DOIUrl":"10.1016/j.image.2025.117404","url":null,"abstract":"<div><div>With regards to figure with inhomogeneous intensity, the models based on active contour model have been widely used. Compared with the classic models, this paper proposes an optimized additive model which contains the edge structure and inhomogeneous components. Second, by introducing a novel clustering criterion, the value of the bias field can be estimated before iteration, greatly speeding the evloving process and reducing the calculation cost. Thus, an improved energy function is drawn out. Considering the gradient descent flow formula, a novel error function and adaptive parameter are utilized to improve the performance of the data term. Finally, the proposed regularization terms ensure the evloving process is more efficient and accurate. Owing to the above mentioned improvements, the proposed model in this paper has excellent performance of the segmentation in terms of robustness, effectiveness and accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117404"},"PeriodicalIF":2.7,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie
{"title":"NTRF-Net: A fuzzy logic-enhanced convolutional neural network for detecting hidden data in digital images","authors":"Ntivuguruzwa Jean De La Croix , Tohari Ahmad , Fengling Han , Royyana Muslim Ijtihadie","doi":"10.1016/j.image.2025.117401","DOIUrl":"10.1016/j.image.2025.117401","url":null,"abstract":"<div><div>Recent advancements in steganalysis have focused on detecting hidden information in images, but locating the possible positions of concealed data in advanced adaptive steganography remains a crucial challenge, especially for images shared over public networks. This paper introduces a novel steganalysis approach, NTRF-Net, designed to identify the location of steganographically altered pixels in digital images. NTRF-Net, focusing on spatial features of an image, combines stochastic feature selection and fuzzy logic within a convolutional neural network, working through three stages: modification map generation, feature classification, and pixel classification. NTRF-Net demonstrates high accuracy, achieving 98.2 % and 86.2 % for the accuracy and F<sub>1</sub> Score, respectively. The ROC curves and AUC values highlight the strong steganographically altered recognition capabilities of the proposed NTRF-Net, which outperform existing benchmarks.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"139 ","pages":"Article 117401"},"PeriodicalIF":2.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144932450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}