使用增量堆自组织映射聚类大规模数据

International Journal of Information and Communication Technology Research Pub Date : 2022-06-01 DOI:10.52547/itrc.14.2.41

M. Fasanghari, Helena Bahrami, Hamideh Sadat Cheraghchi

{"title":"使用增量堆自组织映射聚类大规模数据","authors":"M. Fasanghari, Helena Bahrami, Hamideh Sadat Cheraghchi","doi":"10.52547/itrc.14.2.41","DOIUrl":null,"url":null,"abstract":"— In machine learning and data analysis, clustering large amounts of data is one of the most challenging tasks. In reality, many fields, including research, health, social life, and commerce, rely on the information generated every second. The significance of this enormous amount of data in all facets of contemporary human existence has prompted numerous attempts to develop new methods for analyzing large amounts of data. In this research, an Incremental Heap Self-Organizing Map (IHSOM) is proposed for clustering a vast amount of data that continues to grow. The gradual nature of IHSOM enables environments to change and evolve. In other words, IHSOM can quickly adapt to the size of a dataset. The heap binary tree structure of the proposed approach offers several advantages over other structures. Initially, the topology or neighborhood relationship between data in the input space is maintained in the output space. The outlier data are then routed to the tree's leaf nodes, where they may be efficiently managed. This capability is supplied by a probability density function as a threshold for allocating more similar data to a cluster and transferring less similar data to the following node. The pruning and expanding nodes process renders the algorithm noise-resistant, more precise in clustering, and memory-efficient. Therefore, heap tree structure accelerates node traversal and reorganization following the addition or deletion of nodes. IHSOM's simple user-defined parameters make it a practical unsupervised clustering approach. On both synthetic and real-world datasets, the performance of the proposed algorithm is evaluated and compared to existing hierarchical self-organizing maps and clustering algorithms. The outcomes of the investigation demonstrated IHSOM's proficiency in clustering","PeriodicalId":270455,"journal":{"name":"International Journal of Information and Communication Technology Research","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Clustering Large-Scale Data using an Incremental Heap Self-Organizing Map\",\"authors\":\"M. Fasanghari, Helena Bahrami, Hamideh Sadat Cheraghchi\",\"doi\":\"10.52547/itrc.14.2.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"— In machine learning and data analysis, clustering large amounts of data is one of the most challenging tasks. In reality, many fields, including research, health, social life, and commerce, rely on the information generated every second. The significance of this enormous amount of data in all facets of contemporary human existence has prompted numerous attempts to develop new methods for analyzing large amounts of data. In this research, an Incremental Heap Self-Organizing Map (IHSOM) is proposed for clustering a vast amount of data that continues to grow. The gradual nature of IHSOM enables environments to change and evolve. In other words, IHSOM can quickly adapt to the size of a dataset. The heap binary tree structure of the proposed approach offers several advantages over other structures. Initially, the topology or neighborhood relationship between data in the input space is maintained in the output space. The outlier data are then routed to the tree's leaf nodes, where they may be efficiently managed. This capability is supplied by a probability density function as a threshold for allocating more similar data to a cluster and transferring less similar data to the following node. The pruning and expanding nodes process renders the algorithm noise-resistant, more precise in clustering, and memory-efficient. Therefore, heap tree structure accelerates node traversal and reorganization following the addition or deletion of nodes. IHSOM's simple user-defined parameters make it a practical unsupervised clustering approach. On both synthetic and real-world datasets, the performance of the proposed algorithm is evaluated and compared to existing hierarchical self-organizing maps and clustering algorithms. The outcomes of the investigation demonstrated IHSOM's proficiency in clustering\",\"PeriodicalId\":270455,\"journal\":{\"name\":\"International Journal of Information and Communication Technology Research\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information and Communication Technology Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52547/itrc.14.2.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information and Communication Technology Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52547/itrc.14.2.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在机器学习和数据分析中，聚类大量数据是最具挑战性的任务之一。在现实中，许多领域，包括研究、健康、社会生活和商业，都依赖于每一秒钟产生的信息。这些海量数据在当代人类生活的各个方面都具有重要意义，这促使人们尝试开发分析海量数据的新方法。在本研究中，提出了一种增量堆自组织映射(IHSOM)来聚类持续增长的大量数据。IHSOM的渐进式特性使环境能够改变和进化。换句话说，IHSOM可以快速适应数据集的大小。所提出的方法的堆二叉树结构与其他结构相比具有几个优点。最初，输入空间中数据之间的拓扑或邻域关系保持在输出空间中。然后将异常数据路由到树的叶节点，在那里它们可以被有效地管理。此功能由概率密度函数提供，作为阈值，用于将更相似的数据分配到集群，并将不太相似的数据传输到下一个节点。节点的修剪和扩展过程使算法具有抗噪声、更精确的聚类和内存效率。因此，堆树结构加速了节点添加或删除后的节点遍历和重组。IHSOM的简单用户定义参数使其成为一种实用的无监督聚类方法。在合成数据集和真实数据集上，对该算法的性能进行了评估，并与现有的分层自组织映射和聚类算法进行了比较。调查结果证明了IHSOM在聚类方面的熟练程度

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Clustering Large-Scale Data using an Incremental Heap Self-Organizing Map

— In machine learning and data analysis, clustering large amounts of data is one of the most challenging tasks. In reality, many fields, including research, health, social life, and commerce, rely on the information generated every second. The significance of this enormous amount of data in all facets of contemporary human existence has prompted numerous attempts to develop new methods for analyzing large amounts of data. In this research, an Incremental Heap Self-Organizing Map (IHSOM) is proposed for clustering a vast amount of data that continues to grow. The gradual nature of IHSOM enables environments to change and evolve. In other words, IHSOM can quickly adapt to the size of a dataset. The heap binary tree structure of the proposed approach offers several advantages over other structures. Initially, the topology or neighborhood relationship between data in the input space is maintained in the output space. The outlier data are then routed to the tree's leaf nodes, where they may be efficiently managed. This capability is supplied by a probability density function as a threshold for allocating more similar data to a cluster and transferring less similar data to the following node. The pruning and expanding nodes process renders the algorithm noise-resistant, more precise in clustering, and memory-efficient. Therefore, heap tree structure accelerates node traversal and reorganization following the addition or deletion of nodes. IHSOM's simple user-defined parameters make it a practical unsupervised clustering approach. On both synthetic and real-world datasets, the performance of the proposed algorithm is evaluated and compared to existing hierarchical self-organizing maps and clustering algorithms. The outcomes of the investigation demonstrated IHSOM's proficiency in clustering

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Information and Communication Technology Research

自引率

0.00%

发文量