Shenglin Peng , Tong Gao , Shuyi Qu , Zhe Yu , Jun Wang , Jinye Peng
{"title":"An evaluation and study of detail contrast preservation and color consistency in decolorization","authors":"Shenglin Peng , Tong Gao , Shuyi Qu , Zhe Yu , Jun Wang , Jinye Peng","doi":"10.1016/j.dsp.2025.105468","DOIUrl":"10.1016/j.dsp.2025.105468","url":null,"abstract":"<div><div>Grayscale conversion plays a crucial role in image processing, particularly for edge detection and segmentation tasks, where decolorization quality directly impacts subsequent analysis. An ideal decolorization algorithm should be both efficient and robust while preserving color consistency and detail contrast. In this study, we revisit the RTCP (Real-time Contrast-Preserving Decolorization) algorithm and propose three key optimizations: a clustering-guided decolorization approach, a locally adaptive decolorization strategy, and a weight-optimized decolorization method. To enhance solution quality, we implement a constrained particle swarm optimization framework to systematically explore the parameter space. Experimental validation on two standard datasets (Ĉadík and CSDD) demonstrates that our optimized methods handle diverse decolorization scenarios more effectively while maintaining competitive performance against existing approaches. Recognizing the limitations of current evaluation metrics in assessing detail contrast preservation, we introduce the D-C2G-SSIM metric for more accurate quantitative assessment. Comparative results show consistent improvements over the original RTCP algorithm, with the average D-C2G-SSIM score increasing from 0.8331 to 0.8442 on Ĉadík dataset and from 0.8696 to 0.8847 on the CSDD dataset, confirming the effectiveness of our approach.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105468"},"PeriodicalIF":2.9,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144654626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianming Chen , Dingjian Li , Xiangjin Zeng , Yaman Jing , Zhenbo Ren , Jianglei Di , Yuwen Qin
{"title":"Cross-modal information interaction of binocular predictive networks for RGBT tracking","authors":"Jianming Chen , Dingjian Li , Xiangjin Zeng , Yaman Jing , Zhenbo Ren , Jianglei Di , Yuwen Qin","doi":"10.1016/j.dsp.2025.105473","DOIUrl":"10.1016/j.dsp.2025.105473","url":null,"abstract":"<div><div>RGBT tracking aims to aggregate the information from both visible and thermal infrared modalities to achieve visual object tracking. Although plenty of RGBT tracking methods have been proposed, they usually lead to target loss or tracking drift due to the inability to effectively extract useful feature information contained in the multimodal information. To handle this problem, we propose a cross-modal information interaction binocular prediction network. Firstly, a deep, multi-branch feature extraction network is constructed based on Siamese networks to fully exploit the semantic features of images from different optical modalities. The designed image feature enhancement modules are utilized to effectively capture and enhance object features, thereby improving tracking performance. Secondly, a fusion scheme is developed to achieve bidirectional fusion of multimodal features, leveraging complementary cross-modal information to retain distinguishable object characteristics across different modalities. Finally, the anchor-free concept is introduced into the RGBT object tracking domain and combined with a Peak Adaptive Selection (PAS) module to design a binocular prediction network, making the tracker more flexible and versatile. Evaluation experiments conducted on three standard RGBT tracking datasets, namely GTOT, RGBT234, and LasHeR, demonstrate that the modifications made to the baseline Siamese network architecture are effective. The proposed tracker is competitive with existing state-of-the-art (SOTA) methods, achieving comparable results in terms of precision and success rate. The key advantage of the proposed method lies in the robust fusion of multimodal features and the flexibility introduced by the anchor-free prediction design, which contribute to the stability of the proposed tracker across various scenarios. Code is released at <span><span>https://github.com/JMChenl/RGBT-tracking.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105473"},"PeriodicalIF":2.9,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144654625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haobiao Fan , Yanbing Chen , Yibo Chen , Zhixin Tie , Hao Sheng , Wei Ke
{"title":"Wavelet-based multi-level information compensation learning for visible-infrared person re-identification","authors":"Haobiao Fan , Yanbing Chen , Yibo Chen , Zhixin Tie , Hao Sheng , Wei Ke","doi":"10.1016/j.dsp.2025.105471","DOIUrl":"10.1016/j.dsp.2025.105471","url":null,"abstract":"<div><div>The main challenge in cross-modal person re-identification (VI-ReID) is extracting discriminative features from different modalities. Most existing methods focus on minimizing modal differences but overlook the shallow modality-invariant information lost as network depth increases. To address this, we propose the Wavelet-based Multi-level Information Compensation (WMIC) learning method. At multiple network stages, we design an Information Compensation Block (ICB) that applies wavelet decomposition to deep features, producing four wavelet subbands to preserve modality-invariant details and enlarge the receptive field. These subbands are used to compute an attention matrix with shallow features, which is then applied to enhance shallow features' local information. Additionally, we represent each person image with two sets of embeddings by introducing a Wavelet Enhancement Block (WEB) to generate an additional embedding. Finally, we use a dual-branch center-guided loss to make the two embeddings complementary, thereby reducing the disparity between infrared and visible images. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that WMIC outperforms existing methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105471"},"PeriodicalIF":2.9,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unified estimation method for target number and angle-range parameters in FDA-MIMO radar under impulse noise environments","authors":"Menghan Chen, Hongyuan Gao, Yapeng Liu, Helin Sun","doi":"10.1016/j.dsp.2025.105463","DOIUrl":"10.1016/j.dsp.2025.105463","url":null,"abstract":"<div><div>In the presence of impulse noise, estimating both the target number and angle-range parameters of frequency diverse array multiple-input multiple-output (FDA-MIMO) radars is a recognized challenge. In this paper, a novel unified method is introduced for simultaneously estimating both the angle-range parameter and target number of FDA-MIMO radars under impulse noise. To mitigate the effects of impulse noise, a novel adaptive low-order covariance (ALC) method that requires no prior information is proposed. Additionally, a two-dimensional spatial spectrum function is derived using the ALC-MVDR method, leveraging the ability of the minimum-variance distortionless response (MVDR) beamformer to provide a two-dimensional spatial spectrum when the target number is unknown. To resolve the two-dimensional spatial spectrum function, the multimodal quantum sunflower optimization algorithm (MQSFOA) is introduced, which can effectively identify the number of spectral peaks and estimate the peak location without quantization errors. The interrelated Cramér–Rao bound (CRB) is then deduced for evaluating the developed method under conditions of impulse noise. Comparative simulation studies demonstrate the significant performance improvements of the presented method.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105463"},"PeriodicalIF":2.9,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144654627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongsheng Yang , Yunfei Guo , Hoseok Sul , Jee Woong Choi , Taek Lyul Song
{"title":"Kernel Gaussian processes based extended target tracking in polar coordinate","authors":"Dongsheng Yang , Yunfei Guo , Hoseok Sul , Jee Woong Choi , Taek Lyul Song","doi":"10.1016/j.dsp.2025.105462","DOIUrl":"10.1016/j.dsp.2025.105462","url":null,"abstract":"<div><div>Most of the traditional extended target tracking (ETT) methods struggle with the strong nonlinearity introduced by the polar coordinate measurements and the unknown maneuvering motion model. These factors either lead to high approximation errors or impose a high computational cost, making accurate and efficient tracking challenging. To address these problems, a kernel Gaussian process-based extended target tracking (KGP-ETT) algorithm is proposed. First, the kernel mean embedding (KME) algorithm embeds the posterior distribution into a high-dimensional reproducing kernel Hilbert space (RKHS) and propagates the state particles through the nonlinear motion model, thereby effectively capturing the inherent nonlinearity. Second, based on the KME method, a kernel-based measurement update is proposed to estimate the target state in a linearized manner by integrating kernel techniques into the Gaussian process (GP) framework. Finally, the computational complexity and the theoretical posterior Cramér-Rao lower bound (PCRLB) of the proposed algorithm are analyzed. Simulation and real-world experiments demonstrate that, during target maneuvering, KGP-ETT achieves up to 77% reduction in centroid root mean square error (RMSE), 64% reduction in extent RMSE, and a 148% improvement in intersection of union (IoU) compared to state-of-the-art GP and Variational Bayesian (VB) methods. These results highlight the robustness and accuracy of KGP-ETT in handling complex nonlinear ETT problems in polar coordinates.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105462"},"PeriodicalIF":2.9,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MBDBFormer: a multimodal bridge dual branch Transformer for person re-identification","authors":"Xiangyu Deng, Jing Ding","doi":"10.1016/j.dsp.2025.105481","DOIUrl":"10.1016/j.dsp.2025.105481","url":null,"abstract":"<div><div>A key challenge in the person re-identification (ReID) task is the extraction of robust and discriminative pedestrian features. However, the sensitivity of RGB images in complex scenes to light, viewing angle differences and occlusion seriously affect the stability of feature extraction. To address the above problems, we propose a Multimodal Bridge Dual Branch Transformer (MBDBFormer) by combining CNN and Transformer. First, we use the luminance component in RGB and IHS with its frequency domain transformed low and high frequency components (I) as multimodal inputs for image preprocessing, so that the network takes into account both light adaptation and color information. Second, to effectively fuse the feature advantages of the two modalities, the preprocessed image information is input into a bridge branch network consisting of a multilayer down sampling network, and outputs one global and four local feature information through Transformer coding. Finally, using the dynamic allocation of attention weights, focusing on strengthening the feature expression of discriminative regions such as edges and textures, we designed the Gated Dynamic Attention and Feature Interaction Mechanism (GDFM), which establishes the long-range dependency between RGB and I feature, and achieves the complementary optimization of the two modal features. It enables the output fusion features to retain the rich color information of the RGB modality while incorporating the illumination robustness of the I modality. A number of experimental results show that our proposed method is better than the state-of-the-art method on the Market1501, DukeMTMC, MSMT17 generalized dataset and the Occluded-Duke occlusion dataset, which verifies the effectiveness of our method on the task of person re-identification.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105481"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renhao Jiao , Rui Fan , Weigui Nan , Ming Lu , Xiaojia Yang , Zhiqiang Zhao , Jin Dang , Yanshan Tian , Baiying Dong , Xiaowei He , Xiaoli Luo
{"title":"YOLO-MFDNet: An object detection algorithm for multi-scale remote sensing images","authors":"Renhao Jiao , Rui Fan , Weigui Nan , Ming Lu , Xiaojia Yang , Zhiqiang Zhao , Jin Dang , Yanshan Tian , Baiying Dong , Xiaowei He , Xiaoli Luo","doi":"10.1016/j.dsp.2025.105479","DOIUrl":"10.1016/j.dsp.2025.105479","url":null,"abstract":"<div><div>In the field of remote sensing image target detection, although there have been many research progresses, problems such as complex background and multi-scale changes are still prominent. To this end, this paper proposes a new detection network - YOLO-MFDNet, which aims to enhance the multi-scale target perception ability and improve the detection accuracy. The network includes three key innovations: multi-scale spatial attention (MSSA) mechanism, flexible scaling down sampling (FSDown) mechanism and distance extended IOU (DXIOU) loss function. MSSA combines multi-scale feature fusion and dual-space dimension one-dimensional coding to enhance the spatial representation ability of the target and efficiently integrate multi-scale information. FSDown combines the advantages of depthwise separable convolution, dilated convolution and residual connection to improve the receptive field while maintaining the sensitivity to detail features, taking into account the detection accuracy and computational efficiency. The DXIOU loss function effectively reduces the risk of false detection and missed detection by introducing scale difference modeling. In this paper, the effectiveness of YOLO-MFDNet is verified on three public remote sensing datasets. On the DOTA v2.0 dataset, the mAP50 of YOLO-MFDNet is 2.7 % higher than that of the benchmark model; increase by 1.1 % on the DIOR dataset; it is improved by 6 % on the RSOD dataset, surpassing multiple existing models. In the case of little change in the number of parameters, YOLO-MFDNet shows higher detection accuracy on multiple data sets, which verifies its advantages in improving detection performance under the premise of ensuring computational efficiency. The source code will be available at <span><span>https://github.com/stevenjiaojiao/YOLO-MFDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105479"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144654628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PA-NAFNet: An improved nonlinear activation free network with pyramid attention for single image reflection removal","authors":"Qing Zhang , Yizhong Zhang , Xu Kuang , Yuanbo Zhou , Tong Tong","doi":"10.1016/j.dsp.2025.105474","DOIUrl":"10.1016/j.dsp.2025.105474","url":null,"abstract":"<div><div>Single Image Reflection Removal (SIRR) is an active topic in low-level vision, aiming to eliminate the influence of reflected objects or light sources on image quality. However, due to the ill-posed property of SIRR and the lack of large-scale real world reflection image datasets, existing methods degrade on real datasets and suffer from the problem of reflection residue. To address these issues, we propose an effective SIRR network called PA-NAFNet. It utilizes a non-linear activation-free network (NAFNet) as the baseline and incorporates a pyramid attention module to capture long-range pixel interactions. Additionally, during the training phase, color jittering technique is introduced to increase the diversity of the training dataset, thereby alleviating potential color distortion issues after reflection removal. Experimental results on multiple reflection removal benchmark tests demonstrate the effectiveness of PA-NAFNet. The relevant code is available on this <span><span>link</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105474"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144654629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziwei Chen , Jingyi Li , Liangzhe Zhang, Mingyang Fu
{"title":"Multi-task learning for accurate and efficient nucleus instance segmentation based on ordinal regression","authors":"Ziwei Chen , Jingyi Li , Liangzhe Zhang, Mingyang Fu","doi":"10.1016/j.dsp.2025.105475","DOIUrl":"10.1016/j.dsp.2025.105475","url":null,"abstract":"<div><div>Nucleus instance segmentation is a critical prerequisite in many microscopy-related research fields, including pathology, drug discovery and functional genomics. The biological tasks involved depend on highly accurate and readily available nucleus segmentation results. However, both manual and existing computer-assisted methods face challenges in balancing accuracy and efficiency due to the diverse sizes, shapes and morphologies of nuclei. Additionally, some nuclei are often clustered and overlapping, which imposes higher demands on segmentation methods. Here, we present an ordinal regression-based nucleus instance segmentation method with multi-task learning that leverages rich instance-aware information encoded within spatial-based ordinal rankings. These ordinal rankings are generated and predicted by our proposed Distance Grading Decrease (DGD) strategy and EfficientNet-based lightweight network, W-Net, respectively. Combined with pixel-level foreground probabilities, these rankings are utilized to separate clustered nuclei and achieve accurate segmentation through a marker-controlled watershed algorithm. Our method demonstrates state-of-the-art accuracy and efficiency compared to others, as validated on two independent multi-tissue histology image datasets.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105475"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144679555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junan Zhu, Zhizhe Tang, Zheng Liang, Ping Ma, Chuanjian Wang
{"title":"KANSeg: An efficient medical image segmentation model based on Kolmogorov-Arnold networks for multi-organ segmentation","authors":"Junan Zhu, Zhizhe Tang, Zheng Liang, Ping Ma, Chuanjian Wang","doi":"10.1016/j.dsp.2025.105472","DOIUrl":"10.1016/j.dsp.2025.105472","url":null,"abstract":"<div><div>Currently, multi-organ segmentation methods based on convolution neural networks have achieved milestones in medical image analysis. However, there are some challenging issues such as the complex background, blurred boundaries between organs. These lead to poor boundary segmentation. To address this issue, we propose a multi-organ segmentation method based on Kolmogorov-Arnold Networks (KAN), called KANSeg. We develop a KAN-Activated Convolution module (KAN-ACM) to construct both the encoder and decoder, thereby enhancing the learning and interpretation of intricate patterns within multi-organ images. Moreover, to further augment the model's ability to represent nonlinear features, we design a KAN bottleneck module (KAN-BM) to extract more discriminative features. Finally, we conduct comprehensive experiments on two datasets. The proposed KANSeg can achieve Dice Score of 79.95%, 90.99% on the Synapse multi-organ dataset (Synapse) and the Automated cardiac diagnosis challenge (ACDC) datasets. The outcomes demonstrate that our method yields more accurate segmentation results compared with state-of-the-art methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105472"},"PeriodicalIF":2.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}