{"title":"具有社会意识扩散的级联物理约束条件变分自动编码器用于行人轨迹预测","authors":"Haojie Chen , Zhuo Wang , Hongde Qin , Xiaokai Mu","doi":"10.1016/j.patcog.2025.111667","DOIUrl":null,"url":null,"abstract":"<div><div>Pedestrian trajectory prediction serves as a crucial prerequisite for various tasks such as autonomous driving and human–robot interaction. The existing methods mainly leverage deep learning-based generative models to predict future multi-modal trajectories. Nevertheless, the inherent uncertainty in pedestrian movements poses a challenge for deep generative models to generate accurate and plausible future trajectories. In this paper, we propose a two-stage trajectory prediction network termed CPSD. In the first stage, a Cascaded Physical-constraint Conditional Variational Auto Encoder is proposed. It combines Differentiable Physical Constraint Conditional Variational Auto Encoders in the cascaded form to predict the trajectory coordinates with a stepwise manner, which improves the interpretability of deep generative network and alleviates the problem of prediction error accumulation over time. In the second stage, a Socially-aware Diffusion Model is proposed to refine the initial trajectory generated in the first stage. By introducing a non-local attention mechanism and constructing a social mask, we integrate pedestrian social interactions into the diffusion model, enabling the refinement of more realistic and plausible multi-modal pedestrian trajectories. Extensive experiments conducted on the public datasets SDD and ETH/UCY demonstrate that CPSD achieves more promising pedestrian trajectories compared with other state-of-the-art trajectory prediction algorithms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111667"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cascaded Physical-constraint Conditional Variational Auto Encoder with socially-aware diffusion for pedestrian trajectory prediction\",\"authors\":\"Haojie Chen , Zhuo Wang , Hongde Qin , Xiaokai Mu\",\"doi\":\"10.1016/j.patcog.2025.111667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Pedestrian trajectory prediction serves as a crucial prerequisite for various tasks such as autonomous driving and human–robot interaction. The existing methods mainly leverage deep learning-based generative models to predict future multi-modal trajectories. Nevertheless, the inherent uncertainty in pedestrian movements poses a challenge for deep generative models to generate accurate and plausible future trajectories. In this paper, we propose a two-stage trajectory prediction network termed CPSD. In the first stage, a Cascaded Physical-constraint Conditional Variational Auto Encoder is proposed. It combines Differentiable Physical Constraint Conditional Variational Auto Encoders in the cascaded form to predict the trajectory coordinates with a stepwise manner, which improves the interpretability of deep generative network and alleviates the problem of prediction error accumulation over time. In the second stage, a Socially-aware Diffusion Model is proposed to refine the initial trajectory generated in the first stage. By introducing a non-local attention mechanism and constructing a social mask, we integrate pedestrian social interactions into the diffusion model, enabling the refinement of more realistic and plausible multi-modal pedestrian trajectories. Extensive experiments conducted on the public datasets SDD and ETH/UCY demonstrate that CPSD achieves more promising pedestrian trajectories compared with other state-of-the-art trajectory prediction algorithms.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"166 \",\"pages\":\"Article 111667\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325003279\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003279","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Cascaded Physical-constraint Conditional Variational Auto Encoder with socially-aware diffusion for pedestrian trajectory prediction
Pedestrian trajectory prediction serves as a crucial prerequisite for various tasks such as autonomous driving and human–robot interaction. The existing methods mainly leverage deep learning-based generative models to predict future multi-modal trajectories. Nevertheless, the inherent uncertainty in pedestrian movements poses a challenge for deep generative models to generate accurate and plausible future trajectories. In this paper, we propose a two-stage trajectory prediction network termed CPSD. In the first stage, a Cascaded Physical-constraint Conditional Variational Auto Encoder is proposed. It combines Differentiable Physical Constraint Conditional Variational Auto Encoders in the cascaded form to predict the trajectory coordinates with a stepwise manner, which improves the interpretability of deep generative network and alleviates the problem of prediction error accumulation over time. In the second stage, a Socially-aware Diffusion Model is proposed to refine the initial trajectory generated in the first stage. By introducing a non-local attention mechanism and constructing a social mask, we integrate pedestrian social interactions into the diffusion model, enabling the refinement of more realistic and plausible multi-modal pedestrian trajectories. Extensive experiments conducted on the public datasets SDD and ETH/UCY demonstrate that CPSD achieves more promising pedestrian trajectories compared with other state-of-the-art trajectory prediction algorithms.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.