Research on Text Fault Recognition for On-board Equipment of C3 Train Control System Based on Integrated XGBoost Algorithm

IF 2.7 4区 工程技术 Q2 TRANSPORTATION SCIENCE & TECHNOLOGY
Li Yue, Luyue Liu, Maoqing Li, Baodi Xiao, Xiaochun Wu
{"title":"Research on Text Fault Recognition for On-board Equipment of C3 Train Control System Based on Integrated XGBoost Algorithm","authors":"Li Yue, Luyue Liu, Maoqing Li, Baodi Xiao, Xiaochun Wu","doi":"10.1093/tse/tdac066","DOIUrl":null,"url":null,"abstract":"\n The robust guarantee of train control on-board equipment is inextricably linked to the safe functioning of a high-speed train. A fault diagnostic model of on-board equipment is built utilizing the integrated learning XGBoost (eXtreme Gradient Boosting) algorithm to help technicians assess the malfunction category of high-speed train control on-board equipment accurately and rapidly. XGBoost algorithm iterates multiple decision tree models to improve the accuracy of fault diagnosis by lifting the predicted residual and adding regular terms. To begin, the text features were extracted using the improved TF-IDF (Term Frequency–Inverse Document Frequency) approach, and 24 fault feature words were chosen and converted into weight word vectors. Secondly, considering the imbalanced fault categories in the data set, ADASYN (Adaptive Synthetic sampling) adaptive synthetically oversampling technique was used to synthesize a few category fault samples. Finally, the data samples were split into training and test sets based on the fault text data of CTCS-3 train control on-board equipment recorded by Guangzhou Railway Group maintenance personnel. The XGBoost model was utilized to realize the automatic fault location of the test set after optimized parameter tuning through grid search. Compared with other methods, the evaluation index of the XGBoost model was significantly improved. The diagnostic accuracy reached 95.43%, which verifies the effectiveness of the method in text fault diagnosis.","PeriodicalId":52804,"journal":{"name":"Transportation Safety and Environment","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Safety and Environment","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1093/tse/tdac066","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The robust guarantee of train control on-board equipment is inextricably linked to the safe functioning of a high-speed train. A fault diagnostic model of on-board equipment is built utilizing the integrated learning XGBoost (eXtreme Gradient Boosting) algorithm to help technicians assess the malfunction category of high-speed train control on-board equipment accurately and rapidly. XGBoost algorithm iterates multiple decision tree models to improve the accuracy of fault diagnosis by lifting the predicted residual and adding regular terms. To begin, the text features were extracted using the improved TF-IDF (Term Frequency–Inverse Document Frequency) approach, and 24 fault feature words were chosen and converted into weight word vectors. Secondly, considering the imbalanced fault categories in the data set, ADASYN (Adaptive Synthetic sampling) adaptive synthetically oversampling technique was used to synthesize a few category fault samples. Finally, the data samples were split into training and test sets based on the fault text data of CTCS-3 train control on-board equipment recorded by Guangzhou Railway Group maintenance personnel. The XGBoost model was utilized to realize the automatic fault location of the test set after optimized parameter tuning through grid search. Compared with other methods, the evaluation index of the XGBoost model was significantly improved. The diagnostic accuracy reached 95.43%, which verifies the effectiveness of the method in text fault diagnosis.
基于集成XGBoost算法的C3列控系统车载设备文本故障识别研究
列车控制车载设备的可靠保证与高速列车的安全运行密不可分。利用集成学习XGBoost(eXtreme Gradient Boosting)算法建立车载设备故障诊断模型,帮助技术人员准确、快速地评估高速列控车载设备的故障类别。XGBoost算法迭代多个决策树模型,通过提升预测残差和添加正则项来提高故障诊断的准确性。首先,使用改进的TF-IDF(术语频率-逆文档频率)方法提取文本特征,并选择24个故障特征词并将其转换为权重词向量。其次,考虑到数据集中不平衡的故障类别,采用ADASYN(Adaptive Synthetic sampling)自适应综合过采样技术对少数类别的故障样本进行了综合。最后,根据广铁集团维修人员记录的CTCS-3列控车载设备故障文本数据,将数据样本分解为训练集和测试集。利用XGBoost模型,通过网格搜索优化参数后,实现了测试集故障的自动定位。与其他方法相比,XGBoost模型的评价指标有了显著提高。诊断准确率达到95.43%,验证了该方法在文本故障诊断中的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Transportation Safety and Environment
Transportation Safety and Environment TRANSPORTATION SCIENCE & TECHNOLOGY-
CiteScore
3.90
自引率
13.60%
发文量
32
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信