重新审视k均值分裂:有充分根据的方法和实验评估

V. Grigorev, G. Chernishev
{"title":"重新审视k均值分裂:有充分根据的方法和实验评估","authors":"V. Grigorev, G. Chernishev","doi":"10.1145/2882903.2914833","DOIUrl":null,"url":null,"abstract":"R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"K-means Split Revisited: Well-grounded Approach and Experimental Evaluation\",\"authors\":\"V. Grigorev, G. Chernishev\",\"doi\":\"10.1145/2882903.2914833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.\",\"PeriodicalId\":20483,\"journal\":{\"name\":\"Proceedings of the 2016 International Conference on Management of Data\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2882903.2914833\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2914833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

R-tree是一种用于多维索引的数据结构。本质上,它是一个由嵌套的超矩形组成的平衡树,用于定位数据。这种数据结构对性能最敏感的部分之一是它的分割算法,该算法在节点溢出时运行。根据许多不同的标准,拆分可以以多种方式执行,通常找到最优解的问题是np困难的。有许多启发式分割算法。本文研究了一种现有的k-均值节点分割算法。我们在其理论基础上描述了一些严重的问题,这使得我们重新设计k-means分裂。对于再次出现的k-均值分裂问题,我们提出了几个有根据的解决方案。最后,我们报告了使用PostgreSQL和当代多维结构基准的比较结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
K-means Split Revisited: Well-grounded Approach and Experimental Evaluation
R-tree is a data structure used for multidimensional indexing. Essentially, it is a balanced tree consisting of nested hyper-rectangles which are used to locate the data. One of the most performance sensitive parts of this data structure is its split algorithm, which runs during node overflows. The split can be performed in multiple ways, according to many different criteria and in general the problem of finding an optimal solution is NP-hard. There are many heuristic split algorithms. In this paper we study an existing k-means node split algorithm. We describe a number of serious issues in its theoretical foundation, which made us to re-design k-means split. We propose several well-grounded solutions to the re-emerged problem of k-means split. Finally, we report the comparison results using PostgreSQL and contemporary benchmark for multidimensional structures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信