DATA CLUSTERING BASED ON INDUCTIVE LEARNING OF NEURO-FUZZY NETWORK WITH DISTANCE HASHING

IF 0.3 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Radio Electronics Computer Science Control Pub Date : 2022-12-09 DOI:10.15588/1607-3274-2022-4-6

S. Subbotin

{"title":"DATA CLUSTERING BASED ON INDUCTIVE LEARNING OF NEURO-FUZZY NETWORK WITH DISTANCE HASHING","authors":"S. Subbotin","doi":"10.15588/1607-3274-2022-4-6","DOIUrl":null,"url":null,"abstract":"Context. Cluster analysis is widely used to analyze data of various nature and dimensions. However, the known methods of cluster analysis are characterized by low speed and are demanding on computer memory resources due to the need to calculate pairwise distances between instances in a multidimensional feature space. In addition, the results of known methods of cluster analysis are difficult for human perception and analysis with a large number of features. \nObjective. The purpose of the work is to increase the speed of cluster analysis, the interpretability of the resulting partition into clusters, as well as to reduce the requirements of cluster analysis to computer memory. \nMethod. A method for cluster analysis of multidimensional data is proposed, which for each instance calculates its hash based on the distance to the conditional center of coordinates, uses a one-dimensional coordinate along the hash axis to determine the distances between instances, considers the resulting hash as a pseudo-output feature, breaking it into intervals, which matches the labels pseudo-classes – clusters, having received a rough crisp partition of the feature space and sample instances, automatically generates a partition of input features into fuzzy terms, determines the rules for referring instances to clusters and, as a result, forms a fuzzy inference system of the Mamdani-Zadeh classifier type, which is further trained in the form of a neuro-fuzzy network to ensure acceptable values of the clustering quality functional. This makes it possible to reduce the number of terms and features used, to evaluate their contribution to making decisions about assigning instances to clusters, to increase the speed of data cluster analysis, and to increase the interpretability of the resulting data splitting into clusters. \nResults. The mathematical support for solving the problem of cluster data analysis in conditions of large data dimensions has been developed. The experiments confirmed the operability of the developed mathematical support have been carried out. \nConclusions. . The developed method and its software implementation can be recommended for use in practice in the problems of analyzing data of various nature and dimensions.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"167 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radio Electronics Computer Science Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15588/1607-3274-2022-4-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Context. Cluster analysis is widely used to analyze data of various nature and dimensions. However, the known methods of cluster analysis are characterized by low speed and are demanding on computer memory resources due to the need to calculate pairwise distances between instances in a multidimensional feature space. In addition, the results of known methods of cluster analysis are difficult for human perception and analysis with a large number of features. Objective. The purpose of the work is to increase the speed of cluster analysis, the interpretability of the resulting partition into clusters, as well as to reduce the requirements of cluster analysis to computer memory. Method. A method for cluster analysis of multidimensional data is proposed, which for each instance calculates its hash based on the distance to the conditional center of coordinates, uses a one-dimensional coordinate along the hash axis to determine the distances between instances, considers the resulting hash as a pseudo-output feature, breaking it into intervals, which matches the labels pseudo-classes – clusters, having received a rough crisp partition of the feature space and sample instances, automatically generates a partition of input features into fuzzy terms, determines the rules for referring instances to clusters and, as a result, forms a fuzzy inference system of the Mamdani-Zadeh classifier type, which is further trained in the form of a neuro-fuzzy network to ensure acceptable values of the clustering quality functional. This makes it possible to reduce the number of terms and features used, to evaluate their contribution to making decisions about assigning instances to clusters, to increase the speed of data cluster analysis, and to increase the interpretability of the resulting data splitting into clusters. Results. The mathematical support for solving the problem of cluster data analysis in conditions of large data dimensions has been developed. The experiments confirmed the operability of the developed mathematical support have been carried out. Conclusions. . The developed method and its software implementation can be recommended for use in practice in the problems of analyzing data of various nature and dimensions.

查看原文本刊更多论文

基于距离哈希神经模糊网络归纳学习的数据聚类

上下文。聚类分析被广泛用于分析各种性质和维度的数据。然而，已知的聚类分析方法的特点是速度慢，并且由于需要计算多维特征空间中实例之间的成对距离，需要占用计算机内存资源。此外，已知的聚类分析方法的结果对于大量特征的人类来说是难以感知和分析的。目标。这项工作的目的是提高聚类分析的速度，提高聚类划分结果的可解释性，以及降低聚类分析对计算机内存的要求。方法。提出了一种多维数据聚类分析方法，该方法根据每个实例到条件坐标中心的距离计算其哈希值，使用沿哈希轴的一维坐标确定实例之间的距离，将得到的哈希值作为伪输出特征，将其分解为与伪类-聚类标签相匹配的区间，得到特征空间和样本实例的粗糙清晰划分;自动将输入特征划分为模糊项，确定实例指向聚类的规则，形成Mamdani-Zadeh分类器类型的模糊推理系统，并以神经模糊网络的形式对其进行训练，以保证聚类质量泛函数的可接受值。这样就可以减少使用的术语和特征的数量，评估它们对将实例分配给集群的决策的贡献，提高数据集群分析的速度，并增加将结果数据划分为集群的可解释性。结果。为解决大数据维数条件下的聚类数据分析问题提供了数学支持。实验证实了所开发的数学支持的可操作性。结论。。所开发的方法及其软件实现可以推荐用于实际分析各种性质和维度的数据问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radio Electronics Computer Science Control COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

自引率

20.00%

发文量

审稿时长

12 weeks