{"title":"Generalization strategies for improving bus travel time prediction across networks","authors":"Zack Aemmer , Sondre Sørbø , Alfredo Clemente , Massimiliano Ruocco","doi":"10.1016/j.jum.2024.05.002","DOIUrl":null,"url":null,"abstract":"<div><p>This study focuses on developing and evaluating predictive models for bus travel times adaptable to any transit network, or to new roadway segments without prior travel time data. Most prior work relies on non-standardized features such as road traffic forecasts or closed-source datasets to test predictions on a single route or network. We leverage standardized and open-source data from GTFS and GTFS-RT feeds to gather four months of realtime bus position data from Seattle and Trondheim's transit networks. We then test and refine strategies for generalizing model predictions across both locations. To achieve this, we first develop a data pipeline to process and clean the raw data, then extract features from the standardized sources. We then evaluate the performance of several deep learning and heuristic models in predicting bus travel times between source and target bus networks. Holdout data is taken from selected routes in the source city to validate the internal generalization of the models. Data from the target city is used to evaluate the external generalization of the models. An ablation study explores the impact of different open data sources on model generalization (GPS, static timetables, OpenStreetMap and other realtime trips). We then extend the analysis to 33 international bus networks, placing the results in broader context and testing fine-tuning strategies for generalization. Results show that deep learning methods generalize well within the source network, with as little as 1% loss in MAPE on holdout routes. With minimal fine-tuning generalization is significantly improved on the target network. Model features built on static schedule data, realtime positions or OpenStreetMap embeddings improved generalization performance (up to 10% reduction in MAPE). This was more pronounced for networks with a greater initial quantity of training data. As a route-planning tool for roadways without prior data, geospatial data mining can provide reasonable bus travel time estimates. For cross-sectional bus network analysis, fine tuning on at least 100 trajectory samples for each target network is required to significantly outperform baseline heuristics. This necessitates a GTFS-RT or other standardized realtime data feed in the target city.</p></div>","PeriodicalId":45131,"journal":{"name":"Journal of Urban Management","volume":"13 3","pages":"Pages 372-385"},"PeriodicalIF":3.9000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S222658562400061X/pdfft?md5=eb612411ade4cf7cbc7041868474b4f5&pid=1-s2.0-S222658562400061X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Urban Management","FirstCategoryId":"90","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S222658562400061X","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"URBAN STUDIES","Score":null,"Total":0}
引用次数: 0
Abstract
This study focuses on developing and evaluating predictive models for bus travel times adaptable to any transit network, or to new roadway segments without prior travel time data. Most prior work relies on non-standardized features such as road traffic forecasts or closed-source datasets to test predictions on a single route or network. We leverage standardized and open-source data from GTFS and GTFS-RT feeds to gather four months of realtime bus position data from Seattle and Trondheim's transit networks. We then test and refine strategies for generalizing model predictions across both locations. To achieve this, we first develop a data pipeline to process and clean the raw data, then extract features from the standardized sources. We then evaluate the performance of several deep learning and heuristic models in predicting bus travel times between source and target bus networks. Holdout data is taken from selected routes in the source city to validate the internal generalization of the models. Data from the target city is used to evaluate the external generalization of the models. An ablation study explores the impact of different open data sources on model generalization (GPS, static timetables, OpenStreetMap and other realtime trips). We then extend the analysis to 33 international bus networks, placing the results in broader context and testing fine-tuning strategies for generalization. Results show that deep learning methods generalize well within the source network, with as little as 1% loss in MAPE on holdout routes. With minimal fine-tuning generalization is significantly improved on the target network. Model features built on static schedule data, realtime positions or OpenStreetMap embeddings improved generalization performance (up to 10% reduction in MAPE). This was more pronounced for networks with a greater initial quantity of training data. As a route-planning tool for roadways without prior data, geospatial data mining can provide reasonable bus travel time estimates. For cross-sectional bus network analysis, fine tuning on at least 100 trajectory samples for each target network is required to significantly outperform baseline heuristics. This necessitates a GTFS-RT or other standardized realtime data feed in the target city.
期刊介绍:
Journal of Urban Management (JUM) is the Official Journal of Zhejiang University and the Chinese Association of Urban Management, an international, peer-reviewed open access journal covering planning, administering, regulating, and governing urban complexity.
JUM has its two-fold aims set to integrate the studies across fields in urban planning and management, as well as to provide a more holistic perspective on problem solving.
1) Explore innovative management skills for taming thorny problems that arise with global urbanization
2) Provide a platform to deal with urban affairs whose solutions must be looked at from an interdisciplinary perspective.