改进的DBSCAN聚类算法在工业故障文本数据中的应用

2020 IEEE 18th International Conference on Industrial Informatics (INDIN) Pub Date : 2020-07-20 DOI:10.1109/INDIN45582.2020.9442093

Xiaohan Wang, Lin Zhang, Xuesong Zhang, Kunyu Xie

{"title":"改进的DBSCAN聚类算法在工业故障文本数据中的应用","authors":"Xiaohan Wang, Lin Zhang, Xuesong Zhang, Kunyu Xie","doi":"10.1109/INDIN45582.2020.9442093","DOIUrl":null,"url":null,"abstract":"The industrial fault text data are the special type of short texts, and they come from the records of faults in the factory. Clustering the industrial fault text data can reduce the redundant data and find out the hidden information, which is of great significance to improve the utilization of the industrial fault text data. The industrial fault text data are unstructured and irregular, so the clustering faces quite a few challenges. This paper introduces some existing algorithms for the clustering of short texts, and the shortcomings of them are briefly analyzed. This paper indicates that the main problem of the clustering of the industrial fault text data is the contradiction between the requirements and the setup of parameters, and it leads to low accuracy when cluster the corpus of different sizes. To increase the accuracy of clustering, an improved clustering algorithm is proposed which can solve this contradiction. The results of the comparative experiments show that the improved clustering algorithm has better performance than DBSCAN in corpus of different sizes on the industrial fault text data.","PeriodicalId":185948,"journal":{"name":"2020 IEEE 18th International Conference on Industrial Informatics (INDIN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Improved DBSCAN Clustering Algorithm on Industrial Fault Text Data\",\"authors\":\"Xiaohan Wang, Lin Zhang, Xuesong Zhang, Kunyu Xie\",\"doi\":\"10.1109/INDIN45582.2020.9442093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The industrial fault text data are the special type of short texts, and they come from the records of faults in the factory. Clustering the industrial fault text data can reduce the redundant data and find out the hidden information, which is of great significance to improve the utilization of the industrial fault text data. The industrial fault text data are unstructured and irregular, so the clustering faces quite a few challenges. This paper introduces some existing algorithms for the clustering of short texts, and the shortcomings of them are briefly analyzed. This paper indicates that the main problem of the clustering of the industrial fault text data is the contradiction between the requirements and the setup of parameters, and it leads to low accuracy when cluster the corpus of different sizes. To increase the accuracy of clustering, an improved clustering algorithm is proposed which can solve this contradiction. The results of the comparative experiments show that the improved clustering algorithm has better performance than DBSCAN in corpus of different sizes on the industrial fault text data.\",\"PeriodicalId\":185948,\"journal\":{\"name\":\"2020 IEEE 18th International Conference on Industrial Informatics (INDIN)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 18th International Conference on Industrial Informatics (INDIN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDIN45582.2020.9442093\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 18th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN45582.2020.9442093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

工业故障文本数据是一种特殊类型的短文本，它来源于工厂的故障记录。对工业故障文本数据进行聚类，可以减少冗余数据，发现隐藏信息，对提高工业故障文本数据的利用率具有重要意义。工业故障文本数据是非结构化和不规则的，因此聚类面临着很大的挑战。本文介绍了现有的短文本聚类算法，并简要分析了它们的不足。本文指出，工业故障文本数据聚类的主要问题是参数设置与要求之间的矛盾，导致对不同大小的语料进行聚类时准确率较低。为了提高聚类的准确率，提出了一种改进的聚类算法来解决这一矛盾。对比实验结果表明，改进的聚类算法在不同大小的工业故障文本数据语料库上的聚类性能优于DBSCAN算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application of Improved DBSCAN Clustering Algorithm on Industrial Fault Text Data

The industrial fault text data are the special type of short texts, and they come from the records of faults in the factory. Clustering the industrial fault text data can reduce the redundant data and find out the hidden information, which is of great significance to improve the utilization of the industrial fault text data. The industrial fault text data are unstructured and irregular, so the clustering faces quite a few challenges. This paper introduces some existing algorithms for the clustering of short texts, and the shortcomings of them are briefly analyzed. This paper indicates that the main problem of the clustering of the industrial fault text data is the contradiction between the requirements and the setup of parameters, and it leads to low accuracy when cluster the corpus of different sizes. To increase the accuracy of clustering, an improved clustering algorithm is proposed which can solve this contradiction. The results of the comparative experiments show that the improved clustering algorithm has better performance than DBSCAN in corpus of different sizes on the industrial fault text data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 18th International Conference on Industrial Informatics (INDIN)

自引率

0.00%

发文量