{"title":"基于k均值聚类算法的电力数据分析方法","authors":"Qinyi Lei, Cong Hu, Dehua Hong, Cuiling Liu, Linyan Zhao, Qiu-Ju Sun","doi":"10.1109/ITOEC53115.2022.9734317","DOIUrl":null,"url":null,"abstract":"With the rapid increase in the amount of grid data, more and more data needs to be backed up and restored in the data backup system. However, the similarity between each backup file exceeds 60%, and all storage on the hard disk is a waste of storage space, so it is proposed A DELTA compression method based on K-medoids clustering is used to remove duplicate data in backup data. By performing pairwise DELTA compression on the file blocks, the size of each compressed file is obtained as the similarity between the two file blocks. Through the obtained similarity, K-medoids clustering is performed as a preprocessing step before DELTA compression. Then, according to the clustering results of K-medoids, DELTA compression is performed after merging small file blocks. The test results show that this method can improve the compression rate and reduce the number of fingerprint searches in DELTA compression.","PeriodicalId":127300,"journal":{"name":"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Electricity data analysis method based on K-means clustering algorithm\",\"authors\":\"Qinyi Lei, Cong Hu, Dehua Hong, Cuiling Liu, Linyan Zhao, Qiu-Ju Sun\",\"doi\":\"10.1109/ITOEC53115.2022.9734317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid increase in the amount of grid data, more and more data needs to be backed up and restored in the data backup system. However, the similarity between each backup file exceeds 60%, and all storage on the hard disk is a waste of storage space, so it is proposed A DELTA compression method based on K-medoids clustering is used to remove duplicate data in backup data. By performing pairwise DELTA compression on the file blocks, the size of each compressed file is obtained as the similarity between the two file blocks. Through the obtained similarity, K-medoids clustering is performed as a preprocessing step before DELTA compression. Then, according to the clustering results of K-medoids, DELTA compression is performed after merging small file blocks. The test results show that this method can improve the compression rate and reduce the number of fingerprint searches in DELTA compression.\",\"PeriodicalId\":127300,\"journal\":{\"name\":\"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)\",\"volume\":\"2015 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITOEC53115.2022.9734317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITOEC53115.2022.9734317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Electricity data analysis method based on K-means clustering algorithm
With the rapid increase in the amount of grid data, more and more data needs to be backed up and restored in the data backup system. However, the similarity between each backup file exceeds 60%, and all storage on the hard disk is a waste of storage space, so it is proposed A DELTA compression method based on K-medoids clustering is used to remove duplicate data in backup data. By performing pairwise DELTA compression on the file blocks, the size of each compressed file is obtained as the similarity between the two file blocks. Through the obtained similarity, K-medoids clustering is performed as a preprocessing step before DELTA compression. Then, according to the clustering results of K-medoids, DELTA compression is performed after merging small file blocks. The test results show that this method can improve the compression rate and reduce the number of fingerprint searches in DELTA compression.