Digital Signal Processing最新文献_第9页

SCM-UNet: Spatial-channel Mamba UNet for medical image segmentation sm -UNet：用于医学图像分割的空间通道曼巴UNet

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-22 DOI: 10.1016/j.dsp.2025.105550

Haijie Yan , Qiuhong Hong , Shoulin Wei , Xiangliang Zhang , Jibin Yin

{"title":"SCM-UNet: Spatial-channel Mamba UNet for medical image segmentation","authors":"Haijie Yan , Qiuhong Hong , Shoulin Wei , Xiangliang Zhang , Jibin Yin","doi":"10.1016/j.dsp.2025.105550","DOIUrl":"10.1016/j.dsp.2025.105550","url":null,"abstract":"<div><div>Medical image segmentation plays a critical role in ensuring accurate diagnosis and treatment planning. Despite significant advances in segmentation models based on Convolutional Neural Networks (CNNs) and Transformers, challenges still exist in modeling long-range dependencies and managing computational complexity effectively. To address these challenges, we propose a novel architecture for medical image segmentation, called Spatial-Channel Mamba-UNet (SCM-UNet). This model incorporates the Structured Space Model (SSM) to capture remote dependencies with linear computational complexity, while also leveraging CNNs for local feature extraction. Additionally, we introduce the Spatial-Channel Attention Bridge (SCAB) module, which facilitates multi-scale feature fusion and enhances the model's expressiveness. Comprehensive experimental evaluations on five public benchmark datasets demonstrate that SCM-UNet achieves state-of-the-art (SOTA) performance. Specifically, for skin lesion segmentation, it obtains a mean Intersection over Union (mIoU) of 81.02% on the ISIC 2017 dataset and 81.88% on the ISIC 2018 dataset. To validate its generalizability, SCM-UNet was also evaluated on polyp (Kvasir-SEG, ColonDB) and breast ultrasound (BUSI) segmentation tasks, where it consistently outperformed existing methods, achieving top-ranking mIoU scores of 83.86%, 63.67%, and 69.03%, respectively. Overall, SCM-UNet effectively balances long-range dependency modeling with computational efficiency, offering a robust and versatile solution for various medical image segmentation tasks. This approach represents a promising direction for future research in improving both inference efficiency and accuracy.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105550"},"PeriodicalIF":3.0,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144895599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time UAV small object detection: An efficient approach using SGFNet and dynamic loss optimization 无人机小目标实时检测：基于SGFNet和动态损耗优化的有效方法

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-21 DOI: 10.1016/j.dsp.2025.105543

Yuanteng Cheng , Ting Wang , Wensheng Zhang

{"title":"Real-time UAV small object detection: An efficient approach using SGFNet and dynamic loss optimization","authors":"Yuanteng Cheng , Ting Wang , Wensheng Zhang","doi":"10.1016/j.dsp.2025.105543","DOIUrl":"10.1016/j.dsp.2025.105543","url":null,"abstract":"<div><div>In recent years, unmanned aerial vehicles (UAVs) have become core tools in traffic monitoring applications due to their flexible perspective control and efficient data acquisition capabilities. However, UAV images commonly exhibit small objects (dimensions <16×16 pixels) and motion blur phenomena, which leads to insufficient semantic representation of shallow features and significantly constrains detection accuracy. Meanwhile, the contradiction between existing networks' high computational complexity and practical scenarios' demand for precise real-time detection makes balancing model accuracy and inference efficiency a critical challenge. To address these issues, this paper proposes SGF-YOLO, a real-time efficient small object detection algorithm. We enhance shallow semantic representation and optimize cross-level feature transfer efficiency through integrating a horizontal attention feature fusion (HAFF) module with a multi-scale feature encoder (MFE). Our work innovatively combines auxiliary bounding boxes with a dynamic non-monotonic focusing mechanism to achieve more robust localization optimization in dense scenes. The proposed method is evaluated on three distinct urban UAV image datasets (VisDrone2021-DET, CARPK, and HazyDet) dedicated to civil applications. Compared with the YOLOv8m baseline model, our approach achieves a 10.3% improvement in mAP50 and 18.6% increase in <span><math><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>s</mi><mi>m</mi><mi>a</mi><mi>l</mi><mi>l</mi></mrow></msub></math></span> on the VisDrone2021 dataset, while significantly reducing model size and computational cost (FLOPS). When compared to YOLOv8x, the proposed model demonstrates 68.8% fewer parameters and 40.21% faster inference speed. This study provides a practical solution for UAV-based traffic monitoring systems that effectively balances accuracy, speed, and deployment costs.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105543"},"PeriodicalIF":3.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144895598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MSFA-Net: Multiscale feature enhancement and fusion attention-based dual-encoder network for breast ultrasound segmentation MSFA-Net：基于多尺度特征增强和融合关注的双编码器网络

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-21 DOI: 10.1016/j.dsp.2025.105547

Guoqi Liu, Zuxian Sun, Peiyan Yuan, Sheng Yao, Dong Liu, Baofang Chang

{"title":"MSFA-Net: Multiscale feature enhancement and fusion attention-based dual-encoder network for breast ultrasound segmentation","authors":"Guoqi Liu, Zuxian Sun, Peiyan Yuan, Sheng Yao, Dong Liu, Baofang Chang","doi":"10.1016/j.dsp.2025.105547","DOIUrl":"10.1016/j.dsp.2025.105547","url":null,"abstract":"<div><div>Precise segmentation of breast ultrasound images is vital for the early diagnosis of breast cancer and remains a challenging task. Existing segmentation methods based on convolutional neural networks often have limited receptive fields, leading to inaccurate segmentation of blurred boundaries and irregular lesion shapes in breast ultrasound images. Therefore, we propose a multiscale feature enhancement and fusion attention-based dual-encoder network. Our design is as follows: Firstly, we introduce transformer as a global context-guided encoding branch to establish long-term dependencies. Secondly, with fewer parameters, we propose an efficient auxiliary encoder, which introduces multiscale feature enhancement module and fusion attention module. This design facilitates feature interaction across receptive fields, alleviates the gridding problem, and enhances the model's ability to capture fine-grained local and global features. Thirdly, full-stage adaptive cross-attention fusion dynamically adjust the focus areas and scales, thereby effectively integrating multi-layer information. Extensive experiments are conducted on three publicly available ultrasound datasets, comparing it with 12 state-of-the-art methods. Notably, our model outperforms the suboptimal method, improving Dice metric by 2.56% on the BUS dataset and 2.19% on the BUSI-malignant dataset.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105547"},"PeriodicalIF":3.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144902353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DBSCAN-based particle Gaussian mixture filters 基于dbscan的粒子高斯混合滤波器

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-21 DOI: 10.1016/j.dsp.2025.105546

Sukkeun Kim , Mengwei Sun , Ivan Petrunin , Hyo-Sang Shin

{"title":"DBSCAN-based particle Gaussian mixture filters","authors":"Sukkeun Kim , Mengwei Sun , Ivan Petrunin , Hyo-Sang Shin","doi":"10.1016/j.dsp.2025.105546","DOIUrl":"10.1016/j.dsp.2025.105546","url":null,"abstract":"<div><div>This study addresses nonlinear and non-Gaussian state estimation problems where the particle filter (PF) exhibits the impoverishment issue. This issue arises from the discretisation of the continuous posterior distribution of the state and the use of importance sampling, where the true distribution of the state is unknown. In this study, we propose density-based spatial clustering of applications with noise (DBSCAN)-based particle Gaussian mixture (PGM) filters: the PGM-DS and PGM-DU filters, where DS indicates the PGM filter with <strong><u>D</u></strong>B<strong><u>S</u></strong>CAN and DU indicates the PGM filter with <strong><u>D</u></strong>BSCAN and the unscented transform (<strong><u>U</u></strong>T). These filters assume the posterior distribution of the state to be a Gaussian mixture model (GMM) and sample particles from this GMM. At every time step, the particles are clustered into multiple Gaussian components using DBSCAN, the components are updated with the Kalman/linear minimum mean squared error (LMMSE) update, and the GMM is reconstructed with the updated means and covariances. The proposed filters are tested in three numerical simulation scenarios and compared with other state-of-the-art nonlinear filters. The results show enhanced performance and robustness across the tested simulation scenarios, with lower computational cost compared to the other filters.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105546"},"PeriodicalIF":3.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144921657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

2D DOA estimation for uniform planar array: A closed-form performance bound framework based on information entropy 均匀平面阵列二维方位估计：一种基于信息熵的封闭式性能边界框架

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-21 DOI: 10.1016/j.dsp.2025.105544

Yushan Xie , Dazhuan Xu , Han Zhang , Jiaqi Li , Xiaofei Zhang

{"title":"2D DOA estimation for uniform planar array: A closed-form performance bound framework based on information entropy","authors":"Yushan Xie , Dazhuan Xu , Han Zhang , Jiaqi Li , Xiaofei Zhang","doi":"10.1016/j.dsp.2025.105544","DOIUrl":"10.1016/j.dsp.2025.105544","url":null,"abstract":"<div><div>The theoretical performance bound plays a pivotal role in parameter estimation by establishing benchmarks for evaluating the asymptotic efficiency of estimators. While the classical Cramér-Rao bound (CRB) in non-Bayesian frameworks maintains strict validity only in asymptotic regions, Bayesian approaches can construct globally tight bounds through prior information but suffer from computational bottlenecks induced by high-dimensional integrals and the absence of explicit expressions. This study proposes a novel entropy error bound (EEB) based on information entropy theory for two-dimensional (2D) joint direction of arrival (DOA) estimation and 1D independent estimation in uniform planar arrays (UPAs). By establishing a normalized differential entropy model under signal-to-noise ratio (SNR) partitioning, we derive closed-form analytical solutions for EEB with explicit expressions. These explicit characteristics quantitatively reveal the impact laws of the number of array elements, the root mean square aperture width, and the number of snapshots on estimation performance. Multi-scenario simulations validate that the proposed EEB maintains global tightness across various SNR conditions, thereby providing a universal performance benchmark for arbitrary parameter estimators.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105544"},"PeriodicalIF":3.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144894804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-frame superposition framework for OTFS-based ISAC system: A low complexity parameter estimation approach 基于otfs的ISAC系统多帧叠加框架：一种低复杂度参数估计方法

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-20 DOI: 10.1016/j.dsp.2025.105540

Jianyu Zhu, Jing Liang

{"title":"Multi-frame superposition framework for OTFS-based ISAC system: A low complexity parameter estimation approach","authors":"Jianyu Zhu, Jing Liang","doi":"10.1016/j.dsp.2025.105540","DOIUrl":"10.1016/j.dsp.2025.105540","url":null,"abstract":"<div><div>This work investigates the parameter estimation in integrated sensing and communications (ISAC) systems based on orthogonal time frequency space (OTFS). We first establish an OTFS-based ISAC system for vehicular networks. To overcome the computational complexity limitations in current estimation approaches, a two-dimensional (2D) correlation structure with superimposed multiple frames is established, avoiding multiple iterations and significantly reducing parameter estimation complexity. Building on the 2D correlation structure, an approximate Maximum Likelihood (ML) algorithm based on multi-frame superposition (ML-MFS) is proposed for range and velocity estimation, achieving equivalent estimation performance to conventional methods with substantially lower complexity. To overcome the performance degradation in multi-target scenarios, we develop an estimation method based on the whale optimization algorithm, named WOA-MFS, enabling parallel optimization of all target parameters and overcoming the limitations of block optimization in ML-MFS. Additionally, the Cramér-Rao Lower Bound (CRLB) is derived to theoretically characterize the estimation performance limit of the proposed framework. Numerical results demonstrate that both ML-MFS and WOA-MFS significantly reduce computational complexity compared to the conventional ML algorithm, with WOA-MFS outperforming ML-MFS across diverse parameter settings, demonstrating its robustness and effectiveness in diverse scenarios. Meanwhile, the communication performance simulation validates the sensing-assisted communication capability of the proposed system.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105540"},"PeriodicalIF":3.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144894803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic pseudo-label learning for semi-supervised ultrasound cardiac segmentation 半监督超声心脏分割的动态伪标签学习

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-19 DOI: 10.1016/j.dsp.2025.105558

Feng Gao , Ailian Jiang , Junqi Liu , Jiacheng Wang

{"title":"Dynamic pseudo-label learning for semi-supervised ultrasound cardiac segmentation","authors":"Feng Gao , Ailian Jiang , Junqi Liu , Jiacheng Wang","doi":"10.1016/j.dsp.2025.105558","DOIUrl":"10.1016/j.dsp.2025.105558","url":null,"abstract":"<div><div>Semi-supervised ultrasound cardiac segmentation plays a crucial role in the diagnosis and treatment of cardiovascular diseases, reducing the reliance on labeled data. Existing semi-supervised methods mainly rely on consistency regularization and pseudo-labeling strategies, which are susceptible to introducing noise when processing unlabeled ultrasound images, thereby compromising segmentation performance. To tackle this challenge, we propose a method called <strong>D</strong>ynamic <strong>P</strong>seudo-Label <strong>L</strong>earning (DPL) for Semi-Supervised Ultrasound Cardiac Segmentation. First, to preserve the integrity of cardiac structures and enhance background feature learning which can reduce the noise effects, we introduce an adaptive Copy-Paste data augmentation approach. This method primarily performs image mixing in the background regions, calculating the Copy-Paste proportion based on the background size. Second, to prioritize high-confidence ultrasound samples and progressively introduce more challenging samples during training, we design an entropy-based weight optimization strategy that evaluates the prediction entropy of each unlabeled ultrasound image and dynamically select the proper samples for training. Experimental results demonstrate that the proposed method achieves significant performance improvements across multiple ultrasound cardiac segmentation datasets, confirming its effectiveness and robustness in semi-supervised ultrasound cardiac segmentation tasks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105558"},"PeriodicalIF":3.0,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VadCLIP++: Dynamic vision-language model for weakly supervised video anomaly detection 弱监督视频异常检测的动态视觉语言模型

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-19 DOI: 10.1016/j.dsp.2025.105560

Long Liu , Jianjun Li , Guang Li , Yunfeng Zhai , Ming Zhang

{"title":"VadCLIP++: Dynamic vision-language model for weakly supervised video anomaly detection","authors":"Long Liu , Jianjun Li , Guang Li , Yunfeng Zhai , Ming Zhang","doi":"10.1016/j.dsp.2025.105560","DOIUrl":"10.1016/j.dsp.2025.105560","url":null,"abstract":"<div><div>In the realm of weakly supervised video anomaly detection (WSVAD), the integration of Contrastive Language-Image Pre-training (CLIP) models has demonstrated substantial benefits, highlighting that learning through textual prompts can effectively distinguish between anomalous events and enhance the expression of visual features. However, existing CLIP models in video anomaly detection tasks rely solely on static textual prompts, neglecting the temporal continuity of anomalous behaviors, which limits the understanding of dynamic anomalous behaviors. To address this, this paper proposes a dynamic learnable text prompting mechanism, which supervises and learns the frame difference features of adjacent consecutive frames in the video to capture the dynamic changes of anomalous behaviors. At the same time, by incorporating static textual prompts, the model precisely focuses on anomalous behaviors within individual frames, making the textual prompts in different states more sensitive to the temporal and spatial action details of the video. In addition, a Spatial Feature Selection Module (SFSM) is proposed, which leverages random sampling and TOP-k selection mechanisms to enhance the model's generalization ability in anomalous regions of video frames, while modeling the spatial relationships of the global context. The dynamic learnable text prompt branch handles temporal anomalies by extracting inter-frame difference features, while the static text prompt branch optimizes in-frame anomaly localization under the influence of the SFSM module. The dual-branch collaboration establishes complementary spatiotemporal representations, collectively enhancing detection performance. Experimental results show that on the XD-Violence and UCF-Crime datasets, the proposed method achieves 85.03% AP and 88.12% AUC, thoroughly validating its effectiveness in anomaly detection tasks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105560"},"PeriodicalIF":3.0,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Codesign of constant modulus waveform and receive filters for FDA-MIMO radar FDA-MIMO雷达恒模波形和接收滤波器的协同设计

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-18 DOI: 10.1016/j.dsp.2025.105542

Qiping Zhang , Xin Tai , Yongfeng Zuo , Hua Wang , Jinfeng Hu

{"title":"Codesign of constant modulus waveform and receive filters for FDA-MIMO radar","authors":"Qiping Zhang , Xin Tai , Yongfeng Zuo , Hua Wang , Jinfeng Hu","doi":"10.1016/j.dsp.2025.105542","DOIUrl":"10.1016/j.dsp.2025.105542","url":null,"abstract":"<div><div>The joint design of waveform and receive filter is one of the important technologies currently being studied in Frequency diverse array multi-input multi-output (FDA-MIMO) radar. The problem model studied in this paper is to maximize the signal-to-interference-noise ratio (SINR) of the system under the constant modulus constraint of the waveform and the norm constraint of the filter. The problem is non-convex, which brings challenges to its solution. Existing methods use relaxation-based methods to solve this problem, but this will inevitably introduce relaxation errors. To solve the above problems, we notice that the complex circle-sphere manifold space (CCSMS) can naturally satisfy the constant modulus constraint and norm constraint. Based on this feature, the problem becomes an unconstrained optimization problem on the CCSMS manifold, eliminating the need for relaxation. The Riemannian conjugate gradient algorithm can then be directly applied to solve the waveform and receive filter in parallel. We compared it with the existing methods through simulation: 1) SINR was improved by <span><math><mn>4</mn><mspace></mspace><mrow><mi>dB</mi></mrow></math></span>; 2) the computational complexity was reduced compared with existing methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105542"},"PeriodicalIF":3.0,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144867329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diverse human motion prediction via sampling on Grassmann manifold 基于格拉斯曼歧管采样的多种人体运动预测

IF 3 3区工程技术

Digital Signal Processing Pub Date : 2025-08-18 DOI: 10.1016/j.dsp.2025.105539

Hanqing Tong , Wenwen Ding , Qing Li , Chongyang Ding

{"title":"Diverse human motion prediction via sampling on Grassmann manifold","authors":"Hanqing Tong , Wenwen Ding , Qing Li , Chongyang Ding","doi":"10.1016/j.dsp.2025.105539","DOIUrl":"10.1016/j.dsp.2025.105539","url":null,"abstract":"<div><div>Existing researches capture the multimodal nature of human motion through likelihood-based sampling on latent space. However, in the actual motion process, the high diversity of human training data often leads to mode collapse. Additionally, the training process is complicated by a large number of parameters. In this paper, a novel yet effective sampling method on the Grassmann manifold is proposed to enhance the accuracy and diversity of prediction. A linear dynamical system is utilized to model the spatiotemporal dependence in human motion sequences. The corresponding orthogonal basis vectors are connected as residuals of the encoded motion sequences. The feature space is enhanced by these orthogonal basis vectors. Subsequently, a series of random orthogonal vectors are sampled on the Grassmann manifold using Poisson weights and a reparameterization trick. The method is compared with the latest methods on the Human3.6M dataset and the HumanEva-I dataset. The results show that the method effectively improves the diversity metric by 3% and 10% with a low average error, code and pre-trained models are available at: <span><span><span>https://github.com/Hq-Tong/SOGM</span></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105539"},"PeriodicalIF":3.0,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144867331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0