{"title":"基于全局和局部交互的行人轨迹预测变压器","authors":"Lingyue Kong, Kun Jiang, Yuanda Wang","doi":"10.1109/ACAIT56212.2022.10137826","DOIUrl":null,"url":null,"abstract":"Accurate prediction of pedestrian trajectory is crucial for the autonomous driving system and service robots. In this paper, we further analyze the pedestrian interaction patterns and propose a novel model, named GL-Net, based on the graph structure with two encoders and one decoder. Our model first formulates the short-term spatio-temporal interaction between pedestrians within a single frame by the single sequence encoder. In this module, we utilize a graph attention network (GAT) and a graph-based transformer in parallel to extract both local and global spatial interaction features respectively. A set of candidate trajectories are then generated by the long sequence encoder, which can extract entire temporal dependence in historical pedestrian trajectory and Figure out long-term pedestrian intention. To rectify the inherent uncertainty caused by the multimodal nature, we introduce a Gaussian noise to our spatio-temporal embedding. Evaluations of ETH and UCY datasets show that our model achieves better performance than the previous graph-based models. Moreover, our model produces more reasonable trajectories at the point of social interaction and has a better balance of capturing spatial interaction features and generating temporal sequences than other models.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"22 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transformer with Global and Local Interaction for Pedestrian Trajectory Prediction\",\"authors\":\"Lingyue Kong, Kun Jiang, Yuanda Wang\",\"doi\":\"10.1109/ACAIT56212.2022.10137826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate prediction of pedestrian trajectory is crucial for the autonomous driving system and service robots. In this paper, we further analyze the pedestrian interaction patterns and propose a novel model, named GL-Net, based on the graph structure with two encoders and one decoder. Our model first formulates the short-term spatio-temporal interaction between pedestrians within a single frame by the single sequence encoder. In this module, we utilize a graph attention network (GAT) and a graph-based transformer in parallel to extract both local and global spatial interaction features respectively. A set of candidate trajectories are then generated by the long sequence encoder, which can extract entire temporal dependence in historical pedestrian trajectory and Figure out long-term pedestrian intention. To rectify the inherent uncertainty caused by the multimodal nature, we introduce a Gaussian noise to our spatio-temporal embedding. Evaluations of ETH and UCY datasets show that our model achieves better performance than the previous graph-based models. Moreover, our model produces more reasonable trajectories at the point of social interaction and has a better balance of capturing spatial interaction features and generating temporal sequences than other models.\",\"PeriodicalId\":398228,\"journal\":{\"name\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"volume\":\"22 5\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACAIT56212.2022.10137826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transformer with Global and Local Interaction for Pedestrian Trajectory Prediction
Accurate prediction of pedestrian trajectory is crucial for the autonomous driving system and service robots. In this paper, we further analyze the pedestrian interaction patterns and propose a novel model, named GL-Net, based on the graph structure with two encoders and one decoder. Our model first formulates the short-term spatio-temporal interaction between pedestrians within a single frame by the single sequence encoder. In this module, we utilize a graph attention network (GAT) and a graph-based transformer in parallel to extract both local and global spatial interaction features respectively. A set of candidate trajectories are then generated by the long sequence encoder, which can extract entire temporal dependence in historical pedestrian trajectory and Figure out long-term pedestrian intention. To rectify the inherent uncertainty caused by the multimodal nature, we introduce a Gaussian noise to our spatio-temporal embedding. Evaluations of ETH and UCY datasets show that our model achieves better performance than the previous graph-based models. Moreover, our model produces more reasonable trajectories at the point of social interaction and has a better balance of capturing spatial interaction features and generating temporal sequences than other models.