{"title":"Alpha Lightweight Coreset for k-Means Clustering","authors":"N. Hoang, T. K. Dang","doi":"10.1109/IMCOM53663.2022.9721770","DOIUrl":null,"url":null,"abstract":"The evolution of the Internet and personal devices has changed our modem world to a new age of data. The data now is not only big in volume and size but also huge in varieties and velocity. As a result, data scientists have to investigate and propose more methods to deal with big data. One of the common approaches is that instead of solving problems on the whole data with large-scale size, we can find the answer for the subset of this data; this result subsequently is used as the baseline for finding the actual solution for the original data set. To have the best final results, we have to find the best coreset, which is the subset that must be small enough for effectively reducing computational complexity but must keep all representative characteristics of original data. In this paper, based on the lightweight coreset, we propose a general coreset construction for k-means clustering named the α - lightweight coresets with a new adjustable parameter. Our experimental results have shown that this proposed method can create a good coreset for the k-means clustering problem.","PeriodicalId":367038,"journal":{"name":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"27 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM53663.2022.9721770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The evolution of the Internet and personal devices has changed our modem world to a new age of data. The data now is not only big in volume and size but also huge in varieties and velocity. As a result, data scientists have to investigate and propose more methods to deal with big data. One of the common approaches is that instead of solving problems on the whole data with large-scale size, we can find the answer for the subset of this data; this result subsequently is used as the baseline for finding the actual solution for the original data set. To have the best final results, we have to find the best coreset, which is the subset that must be small enough for effectively reducing computational complexity but must keep all representative characteristics of original data. In this paper, based on the lightweight coreset, we propose a general coreset construction for k-means clustering named the α - lightweight coresets with a new adjustable parameter. Our experimental results have shown that this proposed method can create a good coreset for the k-means clustering problem.