{"title":"A hybrid modelling framework of machine learning and extreme value theory for crash risk estimation using traffic conflicts","authors":"Fizza Hussain , Yuefeng Li , Ashutosh Arun , Md. Mazharul Haque","doi":"10.1016/j.amar.2022.100248","DOIUrl":null,"url":null,"abstract":"<div><p>Extreme value theory is the state-of-the-art modelling technique for estimating crash risk from traffic conflicts, with two different sampling techniques, i.e. block maxima and peak-over-threshold, at its core. However, the uncertainty associated with the estimates obtained by these sampling techniques has been too large to enable its widespread practical use. A fundamental reason for this issue is the improper selection of extreme values and a lack of a suitable and efficient sampling mechanism. This study proposes a hybrid modelling framework of machine learning and extreme value theory to estimate crash risk from traffic conflicts with an efficient sampling technique for identifying extremes. More specifically, a machine learning approach replaces the conventional sampling techniques with anomaly detection techniques since an anomaly is a data point that does not conform with the rest of the data, making it very similar to the definition of an extreme value. Six representative machine learning-based unsupervised anomaly detection algorithms have been tested in this study. They include <em>iforest, minimum covariance determinant, one-class support vector machine, k-nearest neighbours, local outlier factor,</em> and <em>connectivity-based outlier factor</em>. The extremes identified by these algorithms are then fitted to extreme value distributions for both univariate and bivariate frameworks. These algorithms were tested on a large set of traffic conflict data collected for four weekdays (6 am to 6 pm) from three four-legged intersections in Brisbane, Australia. Results indicate that the proposed hybrid models consistently outperform the conventional extreme value models, which use block maxima and peak-over-threshold as the underlying sampling technique. Among the sampling algorithms, <em>iforest</em> has been found to perform better than other algorithms in estimating crash risks from traffic conflicts. The proposed hybrid modelling framework represents a methodological advancement in traffic conflict-based crash estimation models and opens new avenues for exploring the possibility of utilising machine learning techniques within the existing traffic conflict techniques.</p></div>","PeriodicalId":47520,"journal":{"name":"Analytic Methods in Accident Research","volume":"36 ","pages":"Article 100248"},"PeriodicalIF":12.5000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytic Methods in Accident Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213665722000379","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 15
Abstract
Extreme value theory is the state-of-the-art modelling technique for estimating crash risk from traffic conflicts, with two different sampling techniques, i.e. block maxima and peak-over-threshold, at its core. However, the uncertainty associated with the estimates obtained by these sampling techniques has been too large to enable its widespread practical use. A fundamental reason for this issue is the improper selection of extreme values and a lack of a suitable and efficient sampling mechanism. This study proposes a hybrid modelling framework of machine learning and extreme value theory to estimate crash risk from traffic conflicts with an efficient sampling technique for identifying extremes. More specifically, a machine learning approach replaces the conventional sampling techniques with anomaly detection techniques since an anomaly is a data point that does not conform with the rest of the data, making it very similar to the definition of an extreme value. Six representative machine learning-based unsupervised anomaly detection algorithms have been tested in this study. They include iforest, minimum covariance determinant, one-class support vector machine, k-nearest neighbours, local outlier factor, and connectivity-based outlier factor. The extremes identified by these algorithms are then fitted to extreme value distributions for both univariate and bivariate frameworks. These algorithms were tested on a large set of traffic conflict data collected for four weekdays (6 am to 6 pm) from three four-legged intersections in Brisbane, Australia. Results indicate that the proposed hybrid models consistently outperform the conventional extreme value models, which use block maxima and peak-over-threshold as the underlying sampling technique. Among the sampling algorithms, iforest has been found to perform better than other algorithms in estimating crash risks from traffic conflicts. The proposed hybrid modelling framework represents a methodological advancement in traffic conflict-based crash estimation models and opens new avenues for exploring the possibility of utilising machine learning techniques within the existing traffic conflict techniques.
期刊介绍:
Analytic Methods in Accident Research is a journal that publishes articles related to the development and application of advanced statistical and econometric methods in studying vehicle crashes and other accidents. The journal aims to demonstrate how these innovative approaches can provide new insights into the factors influencing the occurrence and severity of accidents, thereby offering guidance for implementing appropriate preventive measures. While the journal primarily focuses on the analytic approach, it also accepts articles covering various aspects of transportation safety (such as road, pedestrian, air, rail, and water safety), construction safety, and other areas where human behavior, machine failures, or system failures lead to property damage or bodily harm.