{"title":"利用局部离群值整流器V.2.0重新定位由K-means和k - medium产生的局部离群值","authors":"Rogelio O. Badiang, B. Gerardo, Ruji P. Medina","doi":"10.1109/ICACSIS47736.2019.8979741","DOIUrl":null,"url":null,"abstract":"The extensive growth in the field of information and communication technology allows easy capture of massive amounts of valuable data in different areas. These data are used in various data mining techniques. However, in some cases, the presence of outliers in the dataset exists. One of the categories of an outlier is the local outlier. Local outliers are data points that deviate locally from the cluster center. They occur when the cluster center, known as centroid or medoid, cannot represent all the data members in the cluster. The unrepresented data are mistakenly classified to their closest clusters, making them local outliers. With this, the study aims to address the problem of local outliers produced by K-means and K-medoids. The Local Outlier Rectifier V.2.0 (LOR V.2.0) is a method used to relocate local outliers to their correct clusters. The simulations show that when LOR V.2.0 is partnered with K-means, it was able to relocate 35.37%, 34.78%, 25%, and 12.28% local outliers of Ionosphere, Breast Cancer Wisconsin, Iris, and Breast Cancer Coimbra datasets, respectively. On the contrary, when LOR V.2.0 is partnered with K-medoids, 29.67% of Breast Cancer Wisconsin, 29.11% of Ionosphere, 25.0% of Iris, and 10.34% of Breast Cancer Coimbra local outliers were transferred to their correct clusters. The result also indicates that the method works better when partnered with K-means.","PeriodicalId":165090,"journal":{"name":"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Relocating Local Outliers Produced by K-means and K-medoids Using Local Outlier Rectifier V.2.0\",\"authors\":\"Rogelio O. Badiang, B. Gerardo, Ruji P. Medina\",\"doi\":\"10.1109/ICACSIS47736.2019.8979741\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The extensive growth in the field of information and communication technology allows easy capture of massive amounts of valuable data in different areas. These data are used in various data mining techniques. However, in some cases, the presence of outliers in the dataset exists. One of the categories of an outlier is the local outlier. Local outliers are data points that deviate locally from the cluster center. They occur when the cluster center, known as centroid or medoid, cannot represent all the data members in the cluster. The unrepresented data are mistakenly classified to their closest clusters, making them local outliers. With this, the study aims to address the problem of local outliers produced by K-means and K-medoids. The Local Outlier Rectifier V.2.0 (LOR V.2.0) is a method used to relocate local outliers to their correct clusters. The simulations show that when LOR V.2.0 is partnered with K-means, it was able to relocate 35.37%, 34.78%, 25%, and 12.28% local outliers of Ionosphere, Breast Cancer Wisconsin, Iris, and Breast Cancer Coimbra datasets, respectively. On the contrary, when LOR V.2.0 is partnered with K-medoids, 29.67% of Breast Cancer Wisconsin, 29.11% of Ionosphere, 25.0% of Iris, and 10.34% of Breast Cancer Coimbra local outliers were transferred to their correct clusters. The result also indicates that the method works better when partnered with K-means.\",\"PeriodicalId\":165090,\"journal\":{\"name\":\"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACSIS47736.2019.8979741\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS47736.2019.8979741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
信息和通信技术领域的广泛发展使得在不同领域轻松捕获大量有价值的数据成为可能。这些数据用于各种数据挖掘技术。然而,在某些情况下,数据集中存在异常值。离群值的一类是局部离群值。局部离群点是局部偏离聚类中心的数据点。当集群中心(称为质心或媒质)不能表示集群中的所有数据成员时,就会出现这种情况。未表示的数据被错误地分类到最接近的簇中,使它们成为局部异常值。因此,本研究旨在解决由k均值和k介质产生的局部异常值问题。Local Outlier Rectifier V.2.0 (LOR V.2.0)是一种将局部离群点重新定位到正确集群的方法。模拟结果表明,当LOR V.2.0与K-means相结合时,它能够分别重新定位电离层、Breast Cancer Wisconsin、Iris和Breast Cancer Coimbra数据集的35.37%、34.78%、25%和12.28%的局部异常值。相反,当LOR V.2.0与K-medoids结合时,29.67%的Breast Cancer Wisconsin、29.11%的Ionosphere、25.0%的Iris和10.34%的Breast Cancer Coimbra local outliers被转移到正确的集群中。结果还表明,当与K-means结合使用时,该方法效果更好。
Relocating Local Outliers Produced by K-means and K-medoids Using Local Outlier Rectifier V.2.0
The extensive growth in the field of information and communication technology allows easy capture of massive amounts of valuable data in different areas. These data are used in various data mining techniques. However, in some cases, the presence of outliers in the dataset exists. One of the categories of an outlier is the local outlier. Local outliers are data points that deviate locally from the cluster center. They occur when the cluster center, known as centroid or medoid, cannot represent all the data members in the cluster. The unrepresented data are mistakenly classified to their closest clusters, making them local outliers. With this, the study aims to address the problem of local outliers produced by K-means and K-medoids. The Local Outlier Rectifier V.2.0 (LOR V.2.0) is a method used to relocate local outliers to their correct clusters. The simulations show that when LOR V.2.0 is partnered with K-means, it was able to relocate 35.37%, 34.78%, 25%, and 12.28% local outliers of Ionosphere, Breast Cancer Wisconsin, Iris, and Breast Cancer Coimbra datasets, respectively. On the contrary, when LOR V.2.0 is partnered with K-medoids, 29.67% of Breast Cancer Wisconsin, 29.11% of Ionosphere, 25.0% of Iris, and 10.34% of Breast Cancer Coimbra local outliers were transferred to their correct clusters. The result also indicates that the method works better when partnered with K-means.