聚类稳定性评价方法

Aleksejs Lozkins, V. Bure
{"title":"聚类稳定性评价方法","authors":"Aleksejs Lozkins, V. Bure","doi":"10.1109/SCP.2015.7342177","DOIUrl":null,"url":null,"abstract":"The article proposes a method for estimation of “correct” number of clusters. The stability methodology is used and quantative stability level assessment is introduced. This problem is important and common in cluster analysis and do not have unique criterion for all dataset types. The suggested approach solves one more problem. Often in socio-economics information analysis there are situations when initial numerical data contains various kinds of inaccuracies (measurement errors, intentional misrepresentations, errors of calculation, errors of mathematical model and other possible sources of errors). In this regard, it is important to choose those classifications of study objects which have the stability property with respect to random noise. The work is aimed to get a reliable and accurate clustering which is stable with respect to random perturbations and solves “ill posed” problems in clustering analysis, i.e. to find suggested number of clusters. In current paper the probabilistic approach to problem resolving is being offered. The variability frequency based on random perturbation is introduced and examined as a main metric for assessing clustering results. This estimation can be used for different clustering algorithms and their stability indices can be compared without additional procedures. The experiment on artificial data using the k-mean clustering approach is carried out.","PeriodicalId":110366,"journal":{"name":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The method of clusters stability assessing\",\"authors\":\"Aleksejs Lozkins, V. Bure\",\"doi\":\"10.1109/SCP.2015.7342177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article proposes a method for estimation of “correct” number of clusters. The stability methodology is used and quantative stability level assessment is introduced. This problem is important and common in cluster analysis and do not have unique criterion for all dataset types. The suggested approach solves one more problem. Often in socio-economics information analysis there are situations when initial numerical data contains various kinds of inaccuracies (measurement errors, intentional misrepresentations, errors of calculation, errors of mathematical model and other possible sources of errors). In this regard, it is important to choose those classifications of study objects which have the stability property with respect to random noise. The work is aimed to get a reliable and accurate clustering which is stable with respect to random perturbations and solves “ill posed” problems in clustering analysis, i.e. to find suggested number of clusters. In current paper the probabilistic approach to problem resolving is being offered. The variability frequency based on random perturbation is introduced and examined as a main metric for assessing clustering results. This estimation can be used for different clustering algorithms and their stability indices can be compared without additional procedures. The experiment on artificial data using the k-mean clustering approach is carried out.\",\"PeriodicalId\":110366,\"journal\":{\"name\":\"2015 International Conference \\\"Stability and Control Processes\\\" in Memory of V.I. Zubov (SCP)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference \\\"Stability and Control Processes\\\" in Memory of V.I. Zubov (SCP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCP.2015.7342177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference \"Stability and Control Processes\" in Memory of V.I. Zubov (SCP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCP.2015.7342177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文提出了一种估计“正确”聚类数量的方法。采用了稳定性方法,并引入了定量的稳定性水平评价方法。这个问题在聚类分析中很重要也很常见,并不是所有数据集类型都有唯一的标准。建议的方法又解决了一个问题。通常在社会经济信息分析中,初始数值数据包含各种不准确性(测量错误、故意歪曲、计算错误、数学模型错误和其他可能的错误来源)。在这方面,选择那些对随机噪声具有稳定性的研究对象分类是很重要的。这项工作的目的是得到一个可靠和准确的聚类,它相对于随机扰动是稳定的,并解决聚类分析中的“病态”问题,即找到建议的聚类数量。本文提出了问题求解的概率方法。引入并检验了基于随机扰动的变异频率作为评价聚类结果的主要指标。这种估计可以用于不同的聚类算法,并且可以比较它们的稳定性指标,而不需要额外的步骤。利用k-均值聚类方法对人工数据进行了实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The method of clusters stability assessing
The article proposes a method for estimation of “correct” number of clusters. The stability methodology is used and quantative stability level assessment is introduced. This problem is important and common in cluster analysis and do not have unique criterion for all dataset types. The suggested approach solves one more problem. Often in socio-economics information analysis there are situations when initial numerical data contains various kinds of inaccuracies (measurement errors, intentional misrepresentations, errors of calculation, errors of mathematical model and other possible sources of errors). In this regard, it is important to choose those classifications of study objects which have the stability property with respect to random noise. The work is aimed to get a reliable and accurate clustering which is stable with respect to random perturbations and solves “ill posed” problems in clustering analysis, i.e. to find suggested number of clusters. In current paper the probabilistic approach to problem resolving is being offered. The variability frequency based on random perturbation is introduced and examined as a main metric for assessing clustering results. This estimation can be used for different clustering algorithms and their stability indices can be compared without additional procedures. The experiment on artificial data using the k-mean clustering approach is carried out.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信