Identification of adequate sample size for conflict-based crash risk evaluation: An investigation using Bayesian hierarchical extreme value theory models
IF 12.5 1区 工程技术Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
{"title":"Identification of adequate sample size for conflict-based crash risk evaluation: An investigation using Bayesian hierarchical extreme value theory models","authors":"Chuanyun Fu , Tarek Sayed","doi":"10.1016/j.amar.2023.100281","DOIUrl":null,"url":null,"abstract":"<div><p>The use of traffic conflict-based models to estimate crash risk and evaluate the safety of road locations is a popular direction for road safety analysis. However, a challenging issue of traffic conflict-based crash risk modeling is the selection of an appropriate sample size. Reliable conflict-based crash risk models typically require a large sample size which is always very difficult to collect. Further, when choosing a sample size, the bias-variance trade-off of model estimation is a constant concern. This study proposes an approach for identifying an adequate sample size for conflict-based crash risk estimation models. The appropriate sample size is determined by checking the model convergence and its goodness-of-fit. A quantitative approach for objectively testing the model goodness-of-fit is developed. Both the trace plots and the variation tendencies of Brooks-Gelman-Rubin statistics of parameter simulation chains are examined to inspect the model convergence. A graphical method is also used to check the model goodness of fit. If the model has not converged or fits poorly, then additional samples are required. The proposed method was applied to identify the adequate sample size for a Bayesian hierarchical extreme value theory (EVT) block maxima (BM) model using traffic conflict data from four signalized intersections in the city of Surrey, British Columbia. The indicator, modified time to collision (MTTC), was used to delineate traffic conflicts. A series of stationary and non-stationary Bayesian hierarchical BM models were developed using the cycle-level maximums of negated MTTC. The adequate sample sizes of stationary and non-stationary Bayesian hierarchical BM models were determined separately. Further, two methods of increasing the sample size (i.e., extending the observation period and combining data from different sites) were compared in terms of goodness-of-fit as well as crash estimate accuracy and precision. The results show that for both stationary and non-stationary models, the sample size used is adequate for model convergence and goodness-of-fit. Moreover, adding covariates to the stationary Bayesian hierarchical BM model does not affect the size of the required sample. Extending the observation period outperforms combining data from different sites in terms of goodness-of-fit as well as crash estimation accuracy and precision of non-stationary models. This is likely related to the existence of unmeasured factors that could impair model estimation and inference when merging data from several sites to augment the number of samples. Overall, the findings of this study can be applied to examine whether available data is adequate and the amount of additional data required for producing reliable statistical inference.</p></div>","PeriodicalId":47520,"journal":{"name":"Analytic Methods in Accident Research","volume":"39 ","pages":"Article 100281"},"PeriodicalIF":12.5000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytic Methods in Accident Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213665723000167","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
The use of traffic conflict-based models to estimate crash risk and evaluate the safety of road locations is a popular direction for road safety analysis. However, a challenging issue of traffic conflict-based crash risk modeling is the selection of an appropriate sample size. Reliable conflict-based crash risk models typically require a large sample size which is always very difficult to collect. Further, when choosing a sample size, the bias-variance trade-off of model estimation is a constant concern. This study proposes an approach for identifying an adequate sample size for conflict-based crash risk estimation models. The appropriate sample size is determined by checking the model convergence and its goodness-of-fit. A quantitative approach for objectively testing the model goodness-of-fit is developed. Both the trace plots and the variation tendencies of Brooks-Gelman-Rubin statistics of parameter simulation chains are examined to inspect the model convergence. A graphical method is also used to check the model goodness of fit. If the model has not converged or fits poorly, then additional samples are required. The proposed method was applied to identify the adequate sample size for a Bayesian hierarchical extreme value theory (EVT) block maxima (BM) model using traffic conflict data from four signalized intersections in the city of Surrey, British Columbia. The indicator, modified time to collision (MTTC), was used to delineate traffic conflicts. A series of stationary and non-stationary Bayesian hierarchical BM models were developed using the cycle-level maximums of negated MTTC. The adequate sample sizes of stationary and non-stationary Bayesian hierarchical BM models were determined separately. Further, two methods of increasing the sample size (i.e., extending the observation period and combining data from different sites) were compared in terms of goodness-of-fit as well as crash estimate accuracy and precision. The results show that for both stationary and non-stationary models, the sample size used is adequate for model convergence and goodness-of-fit. Moreover, adding covariates to the stationary Bayesian hierarchical BM model does not affect the size of the required sample. Extending the observation period outperforms combining data from different sites in terms of goodness-of-fit as well as crash estimation accuracy and precision of non-stationary models. This is likely related to the existence of unmeasured factors that could impair model estimation and inference when merging data from several sites to augment the number of samples. Overall, the findings of this study can be applied to examine whether available data is adequate and the amount of additional data required for producing reliable statistical inference.
期刊介绍:
Analytic Methods in Accident Research is a journal that publishes articles related to the development and application of advanced statistical and econometric methods in studying vehicle crashes and other accidents. The journal aims to demonstrate how these innovative approaches can provide new insights into the factors influencing the occurrence and severity of accidents, thereby offering guidance for implementing appropriate preventive measures. While the journal primarily focuses on the analytic approach, it also accepts articles covering various aspects of transportation safety (such as road, pedestrian, air, rail, and water safety), construction safety, and other areas where human behavior, machine failures, or system failures lead to property damage or bodily harm.