基于交通流特征的实时冲突风险预测:一种新的轨迹数据分析方法

IF 12.6 1区工程技术 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Analytic Methods in Accident Research Pub Date : 2022-09-01 DOI:10.1016/j.amar.2022.100217

Chen Yuan , Ye Li , Helai Huang , Shiqi Wang , Zhenhao Sun , Yan Li

{"title":"基于交通流特征的实时冲突风险预测:一种新的轨迹数据分析方法","authors":"Chen Yuan , Ye Li , Helai Huang , Shiqi Wang , Zhenhao Sun , Yan Li","doi":"10.1016/j.amar.2022.100217","DOIUrl":null,"url":null,"abstract":"<div><p>The real-time conflict prediction model using traffic flow characteristics is much less studied than the crash-based model. This study aims at exploring the relationship between conflicts and traffic flow features with the consideration of heterogeneity and developing predictive models to identify conflict-prone conditions in a real-time manner. The high-resolution trajectory data from the HighD dataset is used as empirical data. A novel method with the virtual detector approach for traffic feature extraction and a two-step framework is proposed for the trajectory data analysis. The framework consists of an exploratory study by random parameter logit model with heterogeneity in means and variances and a comparative study on several machine learning methods, including eXtreme Gradient Boosting (Boosting), Random Forest (Bagging), Support Vector Machine (Single-classifier), and Multilayer-Perceptron (Deep neural network). Results indicate that (1) traffic flow characteristics have significant impacts on the probability of conflict occurrence; (2) the statistical model considering mean heterogeneity outperforms the counterpart and lane differences variables are found to significantly impact the means of random parameters for both lane variables and lane differences variables; (3) eXtreme Gradient Boosting trained on an under-sampled dataset turns out to be the best model with the highest AUC of 0.871 and precision of 0.867, showing that re-sampling techniques can significantly improve the model performance. The proposed model is found to be sensitive to the conflict threshold. Sensitivity analysis on feature selection further confirms that the conflict risk prediction should consider both subject lane features and lane difference features, which verifies the consistency with exploratory analysis based on the statistical model. The consistency between statistical models and machine learning methods improves the interpretability of results for the latter one.</p></div>","PeriodicalId":47520,"journal":{"name":"Analytic Methods in Accident Research","volume":"35 ","pages":"Article 100217"},"PeriodicalIF":12.6000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Using traffic flow characteristics to predict real-time conflict risk: A novel method for trajectory data analysis\",\"authors\":\"Chen Yuan , Ye Li , Helai Huang , Shiqi Wang , Zhenhao Sun , Yan Li\",\"doi\":\"10.1016/j.amar.2022.100217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The real-time conflict prediction model using traffic flow characteristics is much less studied than the crash-based model. This study aims at exploring the relationship between conflicts and traffic flow features with the consideration of heterogeneity and developing predictive models to identify conflict-prone conditions in a real-time manner. The high-resolution trajectory data from the HighD dataset is used as empirical data. A novel method with the virtual detector approach for traffic feature extraction and a two-step framework is proposed for the trajectory data analysis. The framework consists of an exploratory study by random parameter logit model with heterogeneity in means and variances and a comparative study on several machine learning methods, including eXtreme Gradient Boosting (Boosting), Random Forest (Bagging), Support Vector Machine (Single-classifier), and Multilayer-Perceptron (Deep neural network). Results indicate that (1) traffic flow characteristics have significant impacts on the probability of conflict occurrence; (2) the statistical model considering mean heterogeneity outperforms the counterpart and lane differences variables are found to significantly impact the means of random parameters for both lane variables and lane differences variables; (3) eXtreme Gradient Boosting trained on an under-sampled dataset turns out to be the best model with the highest AUC of 0.871 and precision of 0.867, showing that re-sampling techniques can significantly improve the model performance. The proposed model is found to be sensitive to the conflict threshold. Sensitivity analysis on feature selection further confirms that the conflict risk prediction should consider both subject lane features and lane difference features, which verifies the consistency with exploratory analysis based on the statistical model. The consistency between statistical models and machine learning methods improves the interpretability of results for the latter one.</p></div>\",\"PeriodicalId\":47520,\"journal\":{\"name\":\"Analytic Methods in Accident Research\",\"volume\":\"35 \",\"pages\":\"Article 100217\"},\"PeriodicalIF\":12.6000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytic Methods in Accident Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2213665722000069\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytic Methods in Accident Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213665722000069","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 26

摘要

与基于碰撞的实时冲突预测模型相比，基于交通流特征的实时冲突预测模型研究较少。本研究旨在探讨冲突与交通流特征之间的关系，并考虑异质性，建立预测模型，实时识别容易发生冲突的条件。使用HighD数据集的高分辨率轨迹数据作为经验数据。提出了一种基于虚拟检测器的交通特征提取和两步法的轨迹数据分析方法。该框架包括对均值和方差异质性的随机参数logit模型的探索性研究，以及对极端梯度增强(Boosting)、随机森林(Bagging)、支持向量机(Single-classifier)和多层感知器(multi - layer- perceptron)等几种机器学习方法的比较研究。结果表明:(1)交通流特征对冲突发生概率有显著影响;(2)考虑均值异质性的统计模型优于考虑均值异质性的统计模型，车道差异变量对车道变量和车道差异变量的随机参数均值均有显著影响;(3)在欠采样数据集上训练的eXtreme Gradient Boosting是最好的模型，AUC最高为0.871，精度为0.867，表明重采样技术可以显著提高模型性能。结果表明，该模型对冲突阈值敏感。特征选择的敏感性分析进一步证实了冲突风险预测应同时考虑主题车道特征和车道差异特征，验证了与基于统计模型的探索性分析的一致性。统计模型与机器学习方法之间的一致性提高了机器学习结果的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using traffic flow characteristics to predict real-time conflict risk: A novel method for trajectory data analysis

The real-time conflict prediction model using traffic flow characteristics is much less studied than the crash-based model. This study aims at exploring the relationship between conflicts and traffic flow features with the consideration of heterogeneity and developing predictive models to identify conflict-prone conditions in a real-time manner. The high-resolution trajectory data from the HighD dataset is used as empirical data. A novel method with the virtual detector approach for traffic feature extraction and a two-step framework is proposed for the trajectory data analysis. The framework consists of an exploratory study by random parameter logit model with heterogeneity in means and variances and a comparative study on several machine learning methods, including eXtreme Gradient Boosting (Boosting), Random Forest (Bagging), Support Vector Machine (Single-classifier), and Multilayer-Perceptron (Deep neural network). Results indicate that (1) traffic flow characteristics have significant impacts on the probability of conflict occurrence; (2) the statistical model considering mean heterogeneity outperforms the counterpart and lane differences variables are found to significantly impact the means of random parameters for both lane variables and lane differences variables; (3) eXtreme Gradient Boosting trained on an under-sampled dataset turns out to be the best model with the highest AUC of 0.871 and precision of 0.867, showing that re-sampling techniques can significantly improve the model performance. The proposed model is found to be sensitive to the conflict threshold. Sensitivity analysis on feature selection further confirms that the conflict risk prediction should consider both subject lane features and lane difference features, which verifies the consistency with exploratory analysis based on the statistical model. The consistency between statistical models and machine learning methods improves the interpretability of results for the latter one.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Analytic Methods in Accident Research Multiple-

CiteScore

22.10

自引率

34.10%

发文量

审稿时长

24 days

期刊介绍： Analytic Methods in Accident Research is a journal that publishes articles related to the development and application of advanced statistical and econometric methods in studying vehicle crashes and other accidents. The journal aims to demonstrate how these innovative approaches can provide new insights into the factors influencing the occurrence and severity of accidents, thereby offering guidance for implementing appropriate preventive measures. While the journal primarily focuses on the analytic approach, it also accepts articles covering various aspects of transportation safety (such as road, pedestrian, air, rail, and water safety), construction safety, and other areas where human behavior, machine failures, or system failures lead to property damage or bodily harm.