A hybrid modelling framework of machine learning and extreme value theory for crash risk estimation using traffic conflicts

IF 12.5 1区 工程技术 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Fizza Hussain , Yuefeng Li , Ashutosh Arun , Md. Mazharul Haque
{"title":"A hybrid modelling framework of machine learning and extreme value theory for crash risk estimation using traffic conflicts","authors":"Fizza Hussain ,&nbsp;Yuefeng Li ,&nbsp;Ashutosh Arun ,&nbsp;Md. Mazharul Haque","doi":"10.1016/j.amar.2022.100248","DOIUrl":null,"url":null,"abstract":"<div><p>Extreme value theory is the state-of-the-art modelling technique for estimating crash risk from traffic conflicts, with two different sampling techniques, i.e. block maxima and peak-over-threshold, at its core. However, the uncertainty associated with the estimates obtained by these sampling techniques has been too large to enable its widespread practical use. A fundamental reason for this issue is the improper selection of extreme values and a lack of a suitable and efficient sampling mechanism. This study proposes a hybrid modelling framework of machine learning and extreme value theory to estimate crash risk from traffic conflicts with an efficient sampling technique for identifying extremes. More specifically, a machine learning approach replaces the conventional sampling techniques with anomaly detection techniques since an anomaly is a data point that does not conform with the rest of the data, making it very similar to the definition of an extreme value. Six representative machine learning-based unsupervised anomaly detection algorithms have been tested in this study. They include <em>iforest, minimum covariance determinant, one-class support vector machine, k-nearest neighbours, local outlier factor,</em> and <em>connectivity-based outlier factor</em>. The extremes identified by these algorithms are then fitted to extreme value distributions for both univariate and bivariate frameworks. These algorithms were tested on a large set of traffic conflict data collected for four weekdays (6 am to 6 pm) from three four-legged intersections in Brisbane, Australia. Results indicate that the proposed hybrid models consistently outperform the conventional extreme value models, which use block maxima and peak-over-threshold as the underlying sampling technique. Among the sampling algorithms, <em>iforest</em> has been found to perform better than other algorithms in estimating crash risks from traffic conflicts. The proposed hybrid modelling framework represents a methodological advancement in traffic conflict-based crash estimation models and opens new avenues for exploring the possibility of utilising machine learning techniques within the existing traffic conflict techniques.</p></div>","PeriodicalId":47520,"journal":{"name":"Analytic Methods in Accident Research","volume":"36 ","pages":"Article 100248"},"PeriodicalIF":12.5000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytic Methods in Accident Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213665722000379","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 15

Abstract

Extreme value theory is the state-of-the-art modelling technique for estimating crash risk from traffic conflicts, with two different sampling techniques, i.e. block maxima and peak-over-threshold, at its core. However, the uncertainty associated with the estimates obtained by these sampling techniques has been too large to enable its widespread practical use. A fundamental reason for this issue is the improper selection of extreme values and a lack of a suitable and efficient sampling mechanism. This study proposes a hybrid modelling framework of machine learning and extreme value theory to estimate crash risk from traffic conflicts with an efficient sampling technique for identifying extremes. More specifically, a machine learning approach replaces the conventional sampling techniques with anomaly detection techniques since an anomaly is a data point that does not conform with the rest of the data, making it very similar to the definition of an extreme value. Six representative machine learning-based unsupervised anomaly detection algorithms have been tested in this study. They include iforest, minimum covariance determinant, one-class support vector machine, k-nearest neighbours, local outlier factor, and connectivity-based outlier factor. The extremes identified by these algorithms are then fitted to extreme value distributions for both univariate and bivariate frameworks. These algorithms were tested on a large set of traffic conflict data collected for four weekdays (6 am to 6 pm) from three four-legged intersections in Brisbane, Australia. Results indicate that the proposed hybrid models consistently outperform the conventional extreme value models, which use block maxima and peak-over-threshold as the underlying sampling technique. Among the sampling algorithms, iforest has been found to perform better than other algorithms in estimating crash risks from traffic conflicts. The proposed hybrid modelling framework represents a methodological advancement in traffic conflict-based crash estimation models and opens new avenues for exploring the possibility of utilising machine learning techniques within the existing traffic conflict techniques.

基于机器学习和极值理论的交通冲突碰撞风险估计混合建模框架
极值理论是用于估计交通冲突中碰撞风险的最先进的建模技术,其核心是两种不同的采样技术,即块最大值和峰值超过阈值。然而,与这些抽样技术所获得的估计值有关的不确定性太大,使其无法广泛实际使用。造成这一问题的根本原因是极值的选取不当和缺乏合适有效的抽样机制。本研究提出了一个机器学习和极值理论的混合建模框架,通过有效的采样技术来识别极值,以估计交通冲突的碰撞风险。更具体地说,机器学习方法用异常检测技术取代了传统的采样技术,因为异常是与其他数据不一致的数据点,使其非常类似于极值的定义。本研究测试了六种具有代表性的基于机器学习的无监督异常检测算法。它们包括森林、最小协方差行列式、一类支持向量机、k近邻、局部离群因子和基于连通性的离群因子。然后将这些算法识别的极值拟合到单变量和二元框架的极值分布中。这些算法在澳大利亚布里斯班四个工作日(早上6点到下午6点)从三个四条腿的十字路口收集的大量交通冲突数据上进行了测试。结果表明,所提出的混合模型始终优于使用块最大值和峰值超过阈值作为底层采样技术的传统极值模型。在采样算法中,森林算法在估计交通冲突的碰撞风险方面表现优于其他算法。提出的混合建模框架代表了基于交通冲突的碰撞估计模型在方法上的进步,并为探索在现有交通冲突技术中利用机器学习技术的可能性开辟了新的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
22.10
自引率
34.10%
发文量
35
审稿时长
24 days
期刊介绍: Analytic Methods in Accident Research is a journal that publishes articles related to the development and application of advanced statistical and econometric methods in studying vehicle crashes and other accidents. The journal aims to demonstrate how these innovative approaches can provide new insights into the factors influencing the occurrence and severity of accidents, thereby offering guidance for implementing appropriate preventive measures. While the journal primarily focuses on the analytic approach, it also accepts articles covering various aspects of transportation safety (such as road, pedestrian, air, rail, and water safety), construction safety, and other areas where human behavior, machine failures, or system failures lead to property damage or bodily harm.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信