基于CatBoost模型和可解释机器学习框架的高速公路安全影响因素分析

IF 1.8 4区工程技术 Q3 ENGINEERING, CIVIL

Transportation Research Record Pub Date : 2023-11-13 DOI:10.1177/03611981231208903

Jiaqi Li, Xuesong Wang, Xiaohan Yang, Qi Zhang, Hanzhong Pan

{"title":"基于CatBoost模型和可解释机器学习框架的高速公路安全影响因素分析","authors":"Jiaqi Li, Xuesong Wang, Xiaohan Yang, Qi Zhang, Hanzhong Pan","doi":"10.1177/03611981231208903","DOIUrl":null,"url":null,"abstract":"Exploring and analyzing safety influencing factors can guide targeted traffic safety management. Traditional traffic safety models are aimed at specific data problems and making adjustments to the model structure, which lack focus on predictive ability and have limited information on the analysis of influencing factors. In recent years, machine-learning methods have opened new avenues in modeling that have higher prediction accuracy, can identify complex nonlinear relationships, and can overcome over- and under-dispersion and correlation. Machine-learning methods, however, pose the problem of limited interpretability. The interpretable machine-learning framework SHAP can be an effective solution, which can not only reflect the influence of features in each sample but also generate global interpretation. This study established gradient boosting models including the CatBoost and XGBoost models as traffic safety models, which were compared with a traditional NB regression model and a zero-inflated negative binomial regression model. SHAP was used to analyze several safety influencing factors, including geometric design features, traffic operation characteristics, time of day, and land use. Results confirmed that the CatBoost model has better prediction ability and is a more suitable traffic safety model than the traditional negative binomial regression model. Among the key findings were that ramp type is the most important factor in freeway crash frequency; curve presence has a great positive impact, while truck proportion has a great negative impact; and traffic volume is highly correlated with truck proportion. These findings can provide theoretical support for safety operation management and targeted improvement measures for freeways.","PeriodicalId":23279,"journal":{"name":"Transportation Research Record","volume":"134 17","pages":"0"},"PeriodicalIF":1.8000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analyzing Freeway Safety Influencing Factors Using the CatBoost Model and Interpretable Machine-Learning Framework, SHAP\",\"authors\":\"Jiaqi Li, Xuesong Wang, Xiaohan Yang, Qi Zhang, Hanzhong Pan\",\"doi\":\"10.1177/03611981231208903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploring and analyzing safety influencing factors can guide targeted traffic safety management. Traditional traffic safety models are aimed at specific data problems and making adjustments to the model structure, which lack focus on predictive ability and have limited information on the analysis of influencing factors. In recent years, machine-learning methods have opened new avenues in modeling that have higher prediction accuracy, can identify complex nonlinear relationships, and can overcome over- and under-dispersion and correlation. Machine-learning methods, however, pose the problem of limited interpretability. The interpretable machine-learning framework SHAP can be an effective solution, which can not only reflect the influence of features in each sample but also generate global interpretation. This study established gradient boosting models including the CatBoost and XGBoost models as traffic safety models, which were compared with a traditional NB regression model and a zero-inflated negative binomial regression model. SHAP was used to analyze several safety influencing factors, including geometric design features, traffic operation characteristics, time of day, and land use. Results confirmed that the CatBoost model has better prediction ability and is a more suitable traffic safety model than the traditional negative binomial regression model. Among the key findings were that ramp type is the most important factor in freeway crash frequency; curve presence has a great positive impact, while truck proportion has a great negative impact; and traffic volume is highly correlated with truck proportion. These findings can provide theoretical support for safety operation management and targeted improvement measures for freeways.\",\"PeriodicalId\":23279,\"journal\":{\"name\":\"Transportation Research Record\",\"volume\":\"134 17\",\"pages\":\"0\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/03611981231208903\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/03611981231208903","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

探索和分析安全影响因素，可以指导有针对性的交通安全管理。传统的交通安全模型针对具体的数据问题，对模型结构进行调整，缺乏对预测能力的关注，对影响因素分析的信息有限。近年来，机器学习方法在建模方面开辟了新的途径，具有更高的预测精度，可以识别复杂的非线性关系，并且可以克服过度和欠分散和相关性。然而，机器学习方法存在可解释性有限的问题。可解释的机器学习框架SHAP是一种有效的解决方案，它不仅可以反映每个样本中特征的影响，而且可以产生全局解释。本文建立了包括CatBoost和XGBoost模型在内的梯度助推模型作为交通安全模型，并与传统NB回归模型和零膨胀负二项回归模型进行了比较。利用SHAP分析几何设计特征、交通运行特征、时间和土地利用等安全影响因素。结果表明，与传统的负二项回归模型相比，CatBoost模型具有更好的预测能力，是一种更合适的交通安全模型。主要发现包括:匝道类型是影响高速公路碰撞频率的最重要因素;曲线存在有较大的正向影响，卡车比例有较大的负向影响;交通量与卡车比例高度相关。研究结果可为高速公路安全运行管理和有针对性的改进措施提供理论支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analyzing Freeway Safety Influencing Factors Using the CatBoost Model and Interpretable Machine-Learning Framework, SHAP

Exploring and analyzing safety influencing factors can guide targeted traffic safety management. Traditional traffic safety models are aimed at specific data problems and making adjustments to the model structure, which lack focus on predictive ability and have limited information on the analysis of influencing factors. In recent years, machine-learning methods have opened new avenues in modeling that have higher prediction accuracy, can identify complex nonlinear relationships, and can overcome over- and under-dispersion and correlation. Machine-learning methods, however, pose the problem of limited interpretability. The interpretable machine-learning framework SHAP can be an effective solution, which can not only reflect the influence of features in each sample but also generate global interpretation. This study established gradient boosting models including the CatBoost and XGBoost models as traffic safety models, which were compared with a traditional NB regression model and a zero-inflated negative binomial regression model. SHAP was used to analyze several safety influencing factors, including geometric design features, traffic operation characteristics, time of day, and land use. Results confirmed that the CatBoost model has better prediction ability and is a more suitable traffic safety model than the traditional negative binomial regression model. Among the key findings were that ramp type is the most important factor in freeway crash frequency; curve presence has a great positive impact, while truck proportion has a great negative impact; and traffic volume is highly correlated with truck proportion. These findings can provide theoretical support for safety operation management and targeted improvement measures for freeways.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transportation Research Record 工程技术-工程：土木

CiteScore

3.20

自引率

11.80%

发文量

918

审稿时长

4.2 months

期刊介绍： Transportation Research Record: Journal of the Transportation Research Board is one of the most cited and prolific transportation journals in the world, offering unparalleled depth and breadth in the coverage of transportation-related topics. The TRR publishes approximately 70 issues annually of outstanding, peer-reviewed papers presenting research findings in policy, planning, administration, economics and financing, operations, construction, design, maintenance, safety, and more, for all modes of transportation. This site provides electronic access to a full compilation of papers since the 1996 series.