{"title":"Gödel Number based Clustering Algorithm with Decimal First Degree Cellular Automata","authors":"Vicky Vikrant, Narodia Parth P, Kamalika Bhattacharjee","doi":"arxiv-2405.04881","DOIUrl":null,"url":null,"abstract":"In this paper, a decimal first degree cellular automata (FDCA) based\nclustering algorithm is proposed where clusters are created based on\nreachability. Cyclic spaces are created and configurations which are in the\nsame cycle are treated as the same cluster. Here, real-life data objects are\nencoded into decimal strings using G\\\"odel number based encoding. The benefits\nof the scheme is, it reduces the encoded string length while maintaining the\nfeatures properties. Candidate CA rules are identified based on some\ntheoretical criteria such as self-replication and information flow. An\niterative algorithm is developed to generate the desired number of clusters\nover three stages. The results of the clustering are evaluated based on\nbenchmark clustering metrics such as Silhouette score, Davis Bouldin, Calinski\nHarabasz and Dunn Index. In comparison with the existing state-of-the-art\nclustering algorithms, our proposed algorithm gives better performance.","PeriodicalId":501124,"journal":{"name":"arXiv - CS - Formal Languages and Automata Theory","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Formal Languages and Automata Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.04881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, a decimal first degree cellular automata (FDCA) based
clustering algorithm is proposed where clusters are created based on
reachability. Cyclic spaces are created and configurations which are in the
same cycle are treated as the same cluster. Here, real-life data objects are
encoded into decimal strings using G\"odel number based encoding. The benefits
of the scheme is, it reduces the encoded string length while maintaining the
features properties. Candidate CA rules are identified based on some
theoretical criteria such as self-replication and information flow. An
iterative algorithm is developed to generate the desired number of clusters
over three stages. The results of the clustering are evaluated based on
benchmark clustering metrics such as Silhouette score, Davis Bouldin, Calinski
Harabasz and Dunn Index. In comparison with the existing state-of-the-art
clustering algorithms, our proposed algorithm gives better performance.
本文提出了一种基于十进制一级细胞自动机(FDCA)的聚类算法,根据可达性创建聚类。循环空间被创建,处于同一循环中的配置被视为同一聚类。在这里,现实生活中的数据对象使用基于模型数的编码方式编码成十进制字符串。该方案的好处是,在保持特征属性的同时减少了编码字符串的长度。候选 CA 规则是根据自我复制和信息流等理论标准确定的。我们开发了一种迭代算法,分三个阶段生成所需的聚类数量。聚类结果根据基准聚类指标(如 Silhouette score、Davis Bouldin、CalinskiHarabasz 和 Dunn Index)进行评估。与现有的最先进的聚类算法相比,我们提出的算法性能更好。