{"title":"Risk Mapping of Wildlife-Vehicle Collisions across the State of Montana, U.S.A.: A Machine Learning Approach for Imbalanced Data along Rural Roads","authors":"Matthew Bell, Yiyi Wang, Rob Ament","doi":"10.1093/tse/tdad043","DOIUrl":null,"url":null,"abstract":"Wildlife-vehicle collisions (WVCs) with large animals are estimated to cost the United States over ${\\$}$8 billion in property damage, tens of thousands of human injuries, and nearly 200 fatalities each year. Most WVCs occur on rural roads and are not collected evenly among road segments, leading to imbalanced data. There are a disproportionate number of analysis units that have zero WVC cases when investigating large geographic areas for collision risk. Analysis units with zero WVCs can reduce prediction accuracy and weaken the coefficient estimates of statistical learning models. This study demonstrates that the use of the synthetic minority over-sampling technique (SMOTE) to handle imbalanced WVC data in combination with statistical and machine learning models improves the ability to determine seasonal WVC risk across the rural highway network in Montana, USA. An array of regularized variables describing landscape, road, and traffic were used to develop negative binomial and random forest models to infer WVC rates per 100 million vehicle-miles traveled. The RF model is found to work particularly well with SMOTE-augmented data to improve prediction accuracy of seasonal WVC risk. SMOTE-augmented data are found to improve the accuracy to predict crash risk across fine-grained grids while retaining the characteristics of the original dataset. The analyses suggest that SMOTE augmentation mitigates data imbalance that is encountered in seasonally divided WVC data. This research provides the basis for future risk-mapping models and can potentially be used to address the low rates of WVCs and other crash types along rural roads.","PeriodicalId":52804,"journal":{"name":"Transportation Safety and Environment","volume":"35 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Safety and Environment","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1093/tse/tdad043","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Wildlife-vehicle collisions (WVCs) with large animals are estimated to cost the United States over ${\$}$8 billion in property damage, tens of thousands of human injuries, and nearly 200 fatalities each year. Most WVCs occur on rural roads and are not collected evenly among road segments, leading to imbalanced data. There are a disproportionate number of analysis units that have zero WVC cases when investigating large geographic areas for collision risk. Analysis units with zero WVCs can reduce prediction accuracy and weaken the coefficient estimates of statistical learning models. This study demonstrates that the use of the synthetic minority over-sampling technique (SMOTE) to handle imbalanced WVC data in combination with statistical and machine learning models improves the ability to determine seasonal WVC risk across the rural highway network in Montana, USA. An array of regularized variables describing landscape, road, and traffic were used to develop negative binomial and random forest models to infer WVC rates per 100 million vehicle-miles traveled. The RF model is found to work particularly well with SMOTE-augmented data to improve prediction accuracy of seasonal WVC risk. SMOTE-augmented data are found to improve the accuracy to predict crash risk across fine-grained grids while retaining the characteristics of the original dataset. The analyses suggest that SMOTE augmentation mitigates data imbalance that is encountered in seasonally divided WVC data. This research provides the basis for future risk-mapping models and can potentially be used to address the low rates of WVCs and other crash types along rural roads.