{"title":"A Fast Clustering Algorithm for Hybrid Big Data Considering the Global Distribution Information of Samples","authors":"Wen Tian, Lei Shen","doi":"10.1109/PHM-Yantai55411.2022.9941899","DOIUrl":null,"url":null,"abstract":"In view of the poor clustering accuracy of current hybrid large data fast clustering algorithms, a hybrid large data fast clustering algorithm considering global distribution information is proposed. Rough set algorithm is used to collect mixed data samples considering global distribution information of samples. The original mixed data entropy is calculated to complete the initial data partition. MapReduce is combined with the classical spectral clustering algorithm to complete the hybrid large data clustering analysis. So far, the hybrid big data clustering algorithm considering global distribution information of samples is designed. The experimental findings demonstrate that this method's clustering accuracy is comparatively high and that excellent clustering outcomes may be attained.","PeriodicalId":315994,"journal":{"name":"2022 Global Reliability and Prognostics and Health Management (PHM-Yantai)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Global Reliability and Prognostics and Health Management (PHM-Yantai)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PHM-Yantai55411.2022.9941899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In view of the poor clustering accuracy of current hybrid large data fast clustering algorithms, a hybrid large data fast clustering algorithm considering global distribution information is proposed. Rough set algorithm is used to collect mixed data samples considering global distribution information of samples. The original mixed data entropy is calculated to complete the initial data partition. MapReduce is combined with the classical spectral clustering algorithm to complete the hybrid large data clustering analysis. So far, the hybrid big data clustering algorithm considering global distribution information of samples is designed. The experimental findings demonstrate that this method's clustering accuracy is comparatively high and that excellent clustering outcomes may be attained.