Ferenc Tolner, Sándor Fegyverneki, Balázs Barta, György Eigner
{"title":"基于最频繁值方法的鲁棒聚类","authors":"Ferenc Tolner, Sándor Fegyverneki, Balázs Barta, György Eigner","doi":"10.35925/j.multi.2023.1.11","DOIUrl":null,"url":null,"abstract":"Assigning observations to highly separable although relatively homogeneous groups is still a challenging task despite the abundance of well-elaborated theories and effective, practical algorithms. Not just the aim of clustering then the underlying data itself influences the choice of method and the way of assessing the results. Outliers and non-normal data distribution can lead to surprising, unstable and many times undesirable clustering results especially in higher dimensions. This implies the importance of some human supervision in case of such unsupervised algorithms as well. In this paper a robust clustering alternative is presented based on the Most Frequent Value Method for crisp-type clustering in case of real-life data. The proposed approach is compared with the k-Medians algorithm. A favourable attribute of the applied procedure is its ease of application on multidimensional data sets where critical judgment of formed groups is particularly troublesome.","PeriodicalId":496606,"journal":{"name":"Multidiszciplináris tudományok","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust clustering based on the most frequent value method\",\"authors\":\"Ferenc Tolner, Sándor Fegyverneki, Balázs Barta, György Eigner\",\"doi\":\"10.35925/j.multi.2023.1.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Assigning observations to highly separable although relatively homogeneous groups is still a challenging task despite the abundance of well-elaborated theories and effective, practical algorithms. Not just the aim of clustering then the underlying data itself influences the choice of method and the way of assessing the results. Outliers and non-normal data distribution can lead to surprising, unstable and many times undesirable clustering results especially in higher dimensions. This implies the importance of some human supervision in case of such unsupervised algorithms as well. In this paper a robust clustering alternative is presented based on the Most Frequent Value Method for crisp-type clustering in case of real-life data. The proposed approach is compared with the k-Medians algorithm. A favourable attribute of the applied procedure is its ease of application on multidimensional data sets where critical judgment of formed groups is particularly troublesome.\",\"PeriodicalId\":496606,\"journal\":{\"name\":\"Multidiszciplináris tudományok\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multidiszciplináris tudományok\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35925/j.multi.2023.1.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multidiszciplináris tudományok","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35925/j.multi.2023.1.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust clustering based on the most frequent value method
Assigning observations to highly separable although relatively homogeneous groups is still a challenging task despite the abundance of well-elaborated theories and effective, practical algorithms. Not just the aim of clustering then the underlying data itself influences the choice of method and the way of assessing the results. Outliers and non-normal data distribution can lead to surprising, unstable and many times undesirable clustering results especially in higher dimensions. This implies the importance of some human supervision in case of such unsupervised algorithms as well. In this paper a robust clustering alternative is presented based on the Most Frequent Value Method for crisp-type clustering in case of real-life data. The proposed approach is compared with the k-Medians algorithm. A favourable attribute of the applied procedure is its ease of application on multidimensional data sets where critical judgment of formed groups is particularly troublesome.