高维大型数据集中聚类数量的估计

IF 0.7 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Data Warehousing and Mining Pub Date : 2023-01-13 DOI:10.4018/ijdwm.316142

Xutong Zhu, Lingli Li

{"title":"高维大型数据集中聚类数量的估计","authors":"Xutong Zhu, Lingli Li","doi":"10.4018/ijdwm.316142","DOIUrl":null,"url":null,"abstract":"Clustering is a basic primer of exploratory tasks. In order to obtain valuable results, the parameters in the clustering algorithm, the number of clusters must be set appropriately. Existing methods for determining the number of clusters perform well on low-dimensional small datasets, but how to effectively determine the optimal number of clusters on large high-dimensional datasets is still a challenging problem. In this paper, the authors design a method for effectively estimating the optimal number of clusters on large-scale high-dimensional datasets that can overcome the shortcomings of existing estimation methods and accurately and quickly estimate the optimal number of clusters on large-scale high-dimensional datasets. Extensive experiments show that it (1) outperforms existing estimation methods in accuracy and efficiency, (2) generalizes across different datasets, and (3) is suitable for high-dimensional large datasets.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":"42 1","pages":"1-14"},"PeriodicalIF":0.7000,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating the Number of Clusters in High-Dimensional Large Datasets\",\"authors\":\"Xutong Zhu, Lingli Li\",\"doi\":\"10.4018/ijdwm.316142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is a basic primer of exploratory tasks. In order to obtain valuable results, the parameters in the clustering algorithm, the number of clusters must be set appropriately. Existing methods for determining the number of clusters perform well on low-dimensional small datasets, but how to effectively determine the optimal number of clusters on large high-dimensional datasets is still a challenging problem. In this paper, the authors design a method for effectively estimating the optimal number of clusters on large-scale high-dimensional datasets that can overcome the shortcomings of existing estimation methods and accurately and quickly estimate the optimal number of clusters on large-scale high-dimensional datasets. Extensive experiments show that it (1) outperforms existing estimation methods in accuracy and efficiency, (2) generalizes across different datasets, and (3) is suitable for high-dimensional large datasets.\",\"PeriodicalId\":54963,\"journal\":{\"name\":\"International Journal of Data Warehousing and Mining\",\"volume\":\"42 1\",\"pages\":\"1-14\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Warehousing and Mining\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.4018/ijdwm.316142\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Warehousing and Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.4018/ijdwm.316142","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

聚类是探索性任务的基本入门。为了获得有价值的结果，聚类算法中的参数、聚类数量必须设置得当。现有的聚类数量确定方法在低维小数据集上表现良好，但如何在大型高维数据集上有效确定最优聚类数量仍然是一个具有挑战性的问题。本文设计了一种有效估计大规模高维数据集上最优聚类数的方法，克服了现有估计方法的不足，能够准确、快速地估计大规模高维数据集上的最优聚类数。大量的实验表明:(1)在精度和效率上优于现有的估计方法;(2)在不同的数据集上泛化;(3)适用于高维大数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Estimating the Number of Clusters in High-Dimensional Large Datasets

Clustering is a basic primer of exploratory tasks. In order to obtain valuable results, the parameters in the clustering algorithm, the number of clusters must be set appropriately. Existing methods for determining the number of clusters perform well on low-dimensional small datasets, but how to effectively determine the optimal number of clusters on large high-dimensional datasets is still a challenging problem. In this paper, the authors design a method for effectively estimating the optimal number of clusters on large-scale high-dimensional datasets that can overcome the shortcomings of existing estimation methods and accurately and quickly estimate the optimal number of clusters on large-scale high-dimensional datasets. Extensive experiments show that it (1) outperforms existing estimation methods in accuracy and efficiency, (2) generalizes across different datasets, and (3) is suitable for high-dimensional large datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Data Warehousing and Mining COMPUTER SCIENCE, SOFTWARE ENGINEERING-

CiteScore

2.40

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： The International Journal of Data Warehousing and Mining (IJDWM) disseminates the latest international research findings in the areas of data management and analyzation. IJDWM provides a forum for state-of-the-art developments and research, as well as current innovative activities focusing on the integration between the fields of data warehousing and data mining. Emphasizing applicability to real world problems, this journal meets the needs of both academic researchers and practicing IT professionals.The journal is devoted to the publications of high quality papers on theoretical developments and practical applications in data warehousing and data mining. Original research papers, state-of-the-art reviews, and technical notes are invited for publications. The journal accepts paper submission of any work relevant to data warehousing and data mining. Special attention will be given to papers focusing on mining of data from data warehouses; integration of databases, data warehousing, and data mining; and holistic approaches to mining and archiving