{"title":"Predicting primary delay of train services using graph-embedding based machine learning","authors":"Ruifan Tang, Ronghui Liu, Zhiyuan Lin","doi":"10.1016/j.jrtpm.2025.100518","DOIUrl":null,"url":null,"abstract":"<div><div>Train delays can cause huge economic loss and passenger dissatisfaction. The Train Delay Prediction Problem has been investigated by a large number of studies. How to best represent certain features of a train is key to successful prediction. For instance, due to its complex topological nature, a train's route (i.e., origin, intermediate stations and destination) is one of the most difficult features to effectively represent. This study introduces graph embedding to understand and model the complex structure of a railway network which is able to capture a comprehensive collection of features including network topology, infrastructure and train profile. In particular, for the first time, we propose an approach to embed a train's route in a network topology perspective based on Structural Deep Network Embedding (SDNE) and Singular Value Decomposition (SVD). Compared to a conventional advanced method, Principle Component Analysis (PCA), our route embedding not only significantly reduces feature vector length and computational effort, but is also highly accurate and reliable in terms of capturing network topology as evidenced by K-means clustering. Computational experiments based on real-world cases from a UK train operator (TransPennine Express) show our graph-embedding based models are competitive in prediction accuracy and F1-score while are substantially computationally efficient compared to PCA.</div></div>","PeriodicalId":51821,"journal":{"name":"Journal of Rail Transport Planning & Management","volume":"34 ","pages":"Article 100518"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Rail Transport Planning & Management","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210970625000150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 0
Abstract
Train delays can cause huge economic loss and passenger dissatisfaction. The Train Delay Prediction Problem has been investigated by a large number of studies. How to best represent certain features of a train is key to successful prediction. For instance, due to its complex topological nature, a train's route (i.e., origin, intermediate stations and destination) is one of the most difficult features to effectively represent. This study introduces graph embedding to understand and model the complex structure of a railway network which is able to capture a comprehensive collection of features including network topology, infrastructure and train profile. In particular, for the first time, we propose an approach to embed a train's route in a network topology perspective based on Structural Deep Network Embedding (SDNE) and Singular Value Decomposition (SVD). Compared to a conventional advanced method, Principle Component Analysis (PCA), our route embedding not only significantly reduces feature vector length and computational effort, but is also highly accurate and reliable in terms of capturing network topology as evidenced by K-means clustering. Computational experiments based on real-world cases from a UK train operator (TransPennine Express) show our graph-embedding based models are competitive in prediction accuracy and F1-score while are substantially computationally efficient compared to PCA.