Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark

Paschalis Lagias, G. Magoulas, Y. Prifti, A. Provetti
{"title":"Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced Dataset and Benchmark","authors":"Paschalis Lagias, G. Magoulas, Y. Prifti, A. Provetti","doi":"10.48550/arXiv.2205.10441","DOIUrl":null,"url":null,"abstract":"The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport, which are drastically imbalanced with missing attributes sometimes approaching 50\\% of the overall data dimensionality. The paper presents the data analysis pipeline starting from the publicly available data of road traffic accidents and ending with predictors of possible injuries and their degree of severity. It addresses the huge incompleteness of public data with a MissForest model. The paper also introduces two baseline approaches to create injury predictors: a supervised artificial neural network and a reinforcement learning model. The dataset can potentially stimulate diverse aspects of machine learning research on imbalanced datasets and the two approaches can be used as baseline references when researchers test more advanced learning algorithms in this area.","PeriodicalId":202517,"journal":{"name":"International Conference on Engineering Applications of Neural Networks","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Engineering Applications of Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.10441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident. The dataset is created by aggregating publicly available datasets from the UK Department for Transport, which are drastically imbalanced with missing attributes sometimes approaching 50\% of the overall data dimensionality. The paper presents the data analysis pipeline starting from the publicly available data of road traffic accidents and ending with predictors of possible injuries and their degree of severity. It addresses the huge incompleteness of public data with a MissForest model. The paper also introduces two baseline approaches to create injury predictors: a supervised artificial neural network and a reinforcement learning model. The dataset can potentially stimulate diverse aspects of machine learning research on imbalanced datasets and the two approaches can be used as baseline references when researchers test more advanced learning algorithms in this area.
交通事故伤害严重程度预测:一个新的不平衡数据集和基准
本文介绍了一个新的数据集来评估机器学习算法在预测交通事故伤害严重程度方面的性能。该数据集是通过汇总来自英国交通部的公开可用数据集创建的,这些数据集严重不平衡,有时缺失的属性接近总数据维度的50%。本文介绍了数据分析管道,从公开的道路交通事故数据开始,以可能的伤害及其严重程度的预测结束。它用MissForest模型解决了公共数据的巨大不完整性。本文还介绍了两种创建损伤预测的基线方法:监督人工神经网络和强化学习模型。该数据集可以潜在地刺激不平衡数据集上机器学习研究的各个方面,当研究人员在该领域测试更高级的学习算法时,这两种方法可以作为基准参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信