Exploring the relationship between data sample size and traffic flow prediction accuracy

Q1 Engineering
Jianhu Zheng , Minghua Wang , Mingfang Huang
{"title":"Exploring the relationship between data sample size and traffic flow prediction accuracy","authors":"Jianhu Zheng ,&nbsp;Minghua Wang ,&nbsp;Mingfang Huang","doi":"10.1016/j.treng.2024.100279","DOIUrl":null,"url":null,"abstract":"<div><div>Efficiently extracting and analyzing large urban traffic data, accurately predicting traffic conditions, and improving urban traffic management require careful selection of an appropriate data sample size. The suitable size of data sample assumes paramount importance in fostering sustainable transportation development. This paper investigates the relationship between traffic flow prediction performance and data sample size, considering data sample missing rates, duration, and road segment coverage. Real traffic flow data from 13 road sections in Changsha, China, are analyzed using the Decision Tree, Support Vector Machine, Gaussian Process Regression and Artificial Neural Network models. Some key findings include: Lower data sample loss rates improve prediction accuracy by capturing traffic flow patterns effectively, while higher loss rates decrease accuracy; an optimal data sample duration of around 7 days balances prediction accuracy and data stability, with longer durations providing more historical data but risking complexity; Broader road segment coverage gives a more comprehensive traffic flow information, but excessive coverage introduces noise and impacts the improvement of prediction accuracy. The results highlight the significant impact of data sample size on prediction performance. Enhancing reliability can be achieved by reducing data loss, selecting suitable durations, and considering appropriate road segment coverage, supporting improved traffic management and route planning.</div></div>","PeriodicalId":34480,"journal":{"name":"Transportation Engineering","volume":"18 ","pages":"Article 100279"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666691X24000538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

Abstract

Efficiently extracting and analyzing large urban traffic data, accurately predicting traffic conditions, and improving urban traffic management require careful selection of an appropriate data sample size. The suitable size of data sample assumes paramount importance in fostering sustainable transportation development. This paper investigates the relationship between traffic flow prediction performance and data sample size, considering data sample missing rates, duration, and road segment coverage. Real traffic flow data from 13 road sections in Changsha, China, are analyzed using the Decision Tree, Support Vector Machine, Gaussian Process Regression and Artificial Neural Network models. Some key findings include: Lower data sample loss rates improve prediction accuracy by capturing traffic flow patterns effectively, while higher loss rates decrease accuracy; an optimal data sample duration of around 7 days balances prediction accuracy and data stability, with longer durations providing more historical data but risking complexity; Broader road segment coverage gives a more comprehensive traffic flow information, but excessive coverage introduces noise and impacts the improvement of prediction accuracy. The results highlight the significant impact of data sample size on prediction performance. Enhancing reliability can be achieved by reducing data loss, selecting suitable durations, and considering appropriate road segment coverage, supporting improved traffic management and route planning.
探索数据样本大小与交通流量预测准确性之间的关系
要有效提取和分析城市交通大数据、准确预测交通状况并改善城市交通管理,就必须谨慎选择合适的数据样本大小。合适的数据样本大小对促进交通的可持续发展至关重要。本文在考虑数据样本缺失率、持续时间和路段覆盖范围的基础上,研究了交通流量预测性能与数据样本大小之间的关系。本文使用决策树、支持向量机、高斯过程回归和人工神经网络模型对中国长沙 13 个路段的真实交通流量数据进行了分析。一些主要发现包括较低的数据样本丢失率可有效捕捉交通流模式,从而提高预测准确性,而较高的丢失率则会降低预测准确性;7 天左右的最佳数据样本持续时间可平衡预测准确性和数据稳定性,较长的持续时间可提供更多历史数据,但存在复杂性风险;较广的路段覆盖范围可提供更全面的交通流信息,但过大的覆盖范围会引入噪声,影响预测准确性的提高。结果凸显了数据样本大小对预测性能的重要影响。通过减少数据丢失、选择合适的持续时间以及考虑适当的路段覆盖范围,可以提高可靠性,从而为改进交通管理和路线规划提供支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Transportation Engineering
Transportation Engineering Engineering-Automotive Engineering
CiteScore
8.10
自引率
0.00%
发文量
46
审稿时长
90 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信