Application of Improved DBSCAN Clustering Algorithm on Industrial Fault Text Data

Xiaohan Wang, Lin Zhang, Xuesong Zhang, Kunyu Xie
{"title":"Application of Improved DBSCAN Clustering Algorithm on Industrial Fault Text Data","authors":"Xiaohan Wang, Lin Zhang, Xuesong Zhang, Kunyu Xie","doi":"10.1109/INDIN45582.2020.9442093","DOIUrl":null,"url":null,"abstract":"The industrial fault text data are the special type of short texts, and they come from the records of faults in the factory. Clustering the industrial fault text data can reduce the redundant data and find out the hidden information, which is of great significance to improve the utilization of the industrial fault text data. The industrial fault text data are unstructured and irregular, so the clustering faces quite a few challenges. This paper introduces some existing algorithms for the clustering of short texts, and the shortcomings of them are briefly analyzed. This paper indicates that the main problem of the clustering of the industrial fault text data is the contradiction between the requirements and the setup of parameters, and it leads to low accuracy when cluster the corpus of different sizes. To increase the accuracy of clustering, an improved clustering algorithm is proposed which can solve this contradiction. The results of the comparative experiments show that the improved clustering algorithm has better performance than DBSCAN in corpus of different sizes on the industrial fault text data.","PeriodicalId":185948,"journal":{"name":"2020 IEEE 18th International Conference on Industrial Informatics (INDIN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 18th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN45582.2020.9442093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The industrial fault text data are the special type of short texts, and they come from the records of faults in the factory. Clustering the industrial fault text data can reduce the redundant data and find out the hidden information, which is of great significance to improve the utilization of the industrial fault text data. The industrial fault text data are unstructured and irregular, so the clustering faces quite a few challenges. This paper introduces some existing algorithms for the clustering of short texts, and the shortcomings of them are briefly analyzed. This paper indicates that the main problem of the clustering of the industrial fault text data is the contradiction between the requirements and the setup of parameters, and it leads to low accuracy when cluster the corpus of different sizes. To increase the accuracy of clustering, an improved clustering algorithm is proposed which can solve this contradiction. The results of the comparative experiments show that the improved clustering algorithm has better performance than DBSCAN in corpus of different sizes on the industrial fault text data.
改进的DBSCAN聚类算法在工业故障文本数据中的应用
工业故障文本数据是一种特殊类型的短文本,它来源于工厂的故障记录。对工业故障文本数据进行聚类,可以减少冗余数据,发现隐藏信息,对提高工业故障文本数据的利用率具有重要意义。工业故障文本数据是非结构化和不规则的,因此聚类面临着很大的挑战。本文介绍了现有的短文本聚类算法,并简要分析了它们的不足。本文指出,工业故障文本数据聚类的主要问题是参数设置与要求之间的矛盾,导致对不同大小的语料进行聚类时准确率较低。为了提高聚类的准确率,提出了一种改进的聚类算法来解决这一矛盾。对比实验结果表明,改进的聚类算法在不同大小的工业故障文本数据语料库上的聚类性能优于DBSCAN算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信