Jiaqi Lin, Qianqian Ren, Xingfeng Lv, Hui Xu, Yong Liu
{"title":"When multi-view meets multi-level: A novel spatio-temporal transformer for traffic prediction","authors":"Jiaqi Lin, Qianqian Ren, Xingfeng Lv, Hui Xu, Yong Liu","doi":"10.1016/j.inffus.2024.102801","DOIUrl":null,"url":null,"abstract":"<div><div>Traffic prediction is a vital aspect of Intelligent Transportation Systems with widespread applications. The main challenge is accurately modeling the complex spatial and temporal relationships in traffic data. Spatial–temporal Graph Neural Networks (GNNs) have emerged as one of the most promising methods to solve this problem. However, several key issues have not been well addressed in existing studies. Firstly, traffic patterns have significant periodic trends, existing methods often overlook the importance of periodicity. Secondly, most methods model spatial dependencies in a static manner, which limits the ability to learn dynamic traffic patterns. Lastly, achieving satisfactory results for both long-term and short-term forecasting remains a challenge. To tackle the above problems, this paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction, which captures spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from above three levels. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to dynamically capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments are conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"117 ","pages":"Article 102801"},"PeriodicalIF":14.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005797","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Traffic prediction is a vital aspect of Intelligent Transportation Systems with widespread applications. The main challenge is accurately modeling the complex spatial and temporal relationships in traffic data. Spatial–temporal Graph Neural Networks (GNNs) have emerged as one of the most promising methods to solve this problem. However, several key issues have not been well addressed in existing studies. Firstly, traffic patterns have significant periodic trends, existing methods often overlook the importance of periodicity. Secondly, most methods model spatial dependencies in a static manner, which limits the ability to learn dynamic traffic patterns. Lastly, achieving satisfactory results for both long-term and short-term forecasting remains a challenge. To tackle the above problems, this paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction, which captures spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from above three levels. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to dynamically capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments are conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.