Artur Grigorev, Khaled Saleh, Yuming Ou, Adriana-Simona Mihaita
{"title":"在交通事故管理中整合大型语言模型进行严重程度分类:机器学习方法","authors":"Artur Grigorev, Khaled Saleh, Yuming Ou, Adriana-Simona Mihaita","doi":"arxiv-2403.13547","DOIUrl":null,"url":null,"abstract":"This study evaluates the impact of large language models on enhancing machine\nlearning processes for managing traffic incidents. It examines the extent to\nwhich features generated by modern language models improve or match the\naccuracy of predictions when classifying the severity of incidents using\naccident reports. Multiple comparisons performed between combinations of\nlanguage models and machine learning algorithms, including Gradient Boosted\nDecision Trees, Random Forests, and Extreme Gradient Boosting. Our research\nuses both conventional and language model-derived features from texts and\nincident reports, and their combinations to perform severity classification.\nIncorporating features from language models with those directly obtained from\nincident reports has shown to improve, or at least match, the performance of\nmachine learning techniques in assigning severity levels to incidents,\nparticularly when employing Random Forests and Extreme Gradient Boosting\nmethods. This comparison was quantified using the F1-score over uniformly\nsampled data sets to obtain balanced severity classes. The primary contribution\nof this research is in the demonstration of how Large Language Models can be\nintegrated into machine learning workflows for incident management, thereby\nsimplifying feature extraction from unstructured text and enhancing or matching\nthe precision of severity predictions using conventional machine learning\npipeline. The engineering application of this research is illustrated through\nthe effective use of these language processing models to refine the modelling\nprocess for incident severity classification. This work provides significant\ninsights into the application of language processing capabilities in\ncombination with traditional data for improving machine learning pipelines in\nthe context of classifying incident severity.","PeriodicalId":501062,"journal":{"name":"arXiv - CS - Systems and Control","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating Large Language Models for Severity Classification in Traffic Incident Management: A Machine Learning Approach\",\"authors\":\"Artur Grigorev, Khaled Saleh, Yuming Ou, Adriana-Simona Mihaita\",\"doi\":\"arxiv-2403.13547\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study evaluates the impact of large language models on enhancing machine\\nlearning processes for managing traffic incidents. It examines the extent to\\nwhich features generated by modern language models improve or match the\\naccuracy of predictions when classifying the severity of incidents using\\naccident reports. Multiple comparisons performed between combinations of\\nlanguage models and machine learning algorithms, including Gradient Boosted\\nDecision Trees, Random Forests, and Extreme Gradient Boosting. Our research\\nuses both conventional and language model-derived features from texts and\\nincident reports, and their combinations to perform severity classification.\\nIncorporating features from language models with those directly obtained from\\nincident reports has shown to improve, or at least match, the performance of\\nmachine learning techniques in assigning severity levels to incidents,\\nparticularly when employing Random Forests and Extreme Gradient Boosting\\nmethods. This comparison was quantified using the F1-score over uniformly\\nsampled data sets to obtain balanced severity classes. The primary contribution\\nof this research is in the demonstration of how Large Language Models can be\\nintegrated into machine learning workflows for incident management, thereby\\nsimplifying feature extraction from unstructured text and enhancing or matching\\nthe precision of severity predictions using conventional machine learning\\npipeline. The engineering application of this research is illustrated through\\nthe effective use of these language processing models to refine the modelling\\nprocess for incident severity classification. This work provides significant\\ninsights into the application of language processing capabilities in\\ncombination with traditional data for improving machine learning pipelines in\\nthe context of classifying incident severity.\",\"PeriodicalId\":501062,\"journal\":{\"name\":\"arXiv - CS - Systems and Control\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Systems and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2403.13547\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.13547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
本研究评估了大型语言模型对加强交通事故管理机器学习流程的影响。在使用事故报告对事故严重程度进行分类时,研究了现代语言模型生成的特征在多大程度上提高或匹配了预测的准确性。对语言模型和机器学习算法(包括梯度提升决策树、随机森林和极端梯度提升)的组合进行了多重比较。我们的研究使用来自文本和事故报告的传统特征和语言模型衍生特征,以及它们的组合来执行严重性分类。将来自语言模型的特征与直接从事故报告中获得的特征相结合,可以提高或至少匹配机器学习技术在为事故分配严重性级别时的性能,尤其是在使用随机森林和极端梯度提升方法时。这种比较使用均匀采样数据集上的 F1 分数进行量化,以获得平衡的严重程度等级。本研究的主要贡献在于展示了如何将大型语言模型集成到用于事件管理的机器学习工作流中,从而简化从非结构化文本中提取特征的过程,并提高或匹配使用传统机器学习管道进行严重性预测的精度。通过有效利用这些语言处理模型来完善事件严重性分类的建模过程,说明了这项研究的工程应用。这项工作为语言处理能力与传统数据相结合的应用提供了重要启示,有助于在事件严重性分类的背景下改进机器学习管道。
Integrating Large Language Models for Severity Classification in Traffic Incident Management: A Machine Learning Approach
This study evaluates the impact of large language models on enhancing machine
learning processes for managing traffic incidents. It examines the extent to
which features generated by modern language models improve or match the
accuracy of predictions when classifying the severity of incidents using
accident reports. Multiple comparisons performed between combinations of
language models and machine learning algorithms, including Gradient Boosted
Decision Trees, Random Forests, and Extreme Gradient Boosting. Our research
uses both conventional and language model-derived features from texts and
incident reports, and their combinations to perform severity classification.
Incorporating features from language models with those directly obtained from
incident reports has shown to improve, or at least match, the performance of
machine learning techniques in assigning severity levels to incidents,
particularly when employing Random Forests and Extreme Gradient Boosting
methods. This comparison was quantified using the F1-score over uniformly
sampled data sets to obtain balanced severity classes. The primary contribution
of this research is in the demonstration of how Large Language Models can be
integrated into machine learning workflows for incident management, thereby
simplifying feature extraction from unstructured text and enhancing or matching
the precision of severity predictions using conventional machine learning
pipeline. The engineering application of this research is illustrated through
the effective use of these language processing models to refine the modelling
process for incident severity classification. This work provides significant
insights into the application of language processing capabilities in
combination with traditional data for improving machine learning pipelines in
the context of classifying incident severity.