探索基于自动编码器的数据聚类中的结构成分

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2024-11-25 DOI:10.1016/j.engappai.2024.109562

Sujoy Chatterjee , Suvra Jyoti Choudhury

{"title":"探索基于自动编码器的数据聚类中的结构成分","authors":"Sujoy Chatterjee , Suvra Jyoti Choudhury","doi":"10.1016/j.engappai.2024.109562","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering is a fundamental machine-learning task that has received extensive popularity in the literature. The foundational tenet of traditional clustering approaches is that data are learned to be vectorized features through various representational learning techniques. The conventional clustering methods can no longer manage the high-dimensional data as the data gets more intricate. Numerous representational learning strategies using deep architectures have been presented over the years, particularly deep unsupervised learning due to its superiority over conventional approaches. In most existing research, especially in the autoencoder-based approaches, only the distance information of pair-of-points in the original data space is retained in the latent space. However, combining this with additional preserving factors like the variance and independent component in the original data and latent space, respectively, is important. In addition, the model’s stability under noisy conditions is crucial. This paper provides a unique method for clustering data that combines autoencoder (AE), principal component analysis (PCA), and independent component analysis (ICA) to capture a relevant latent space representation. A further aid in lowering the dimensionality to improve clustering effectiveness is employing two-dimensional reduction algorithms, i.e., PCA and <span><math><mrow><mi>t</mi><mo>−</mo></mrow></math></span>distributed stochastic neighbor embedding (<span><math><mrow><mi>t</mi><mo>−</mo></mrow></math></span>SNE). The proposed technique produces more precise and reliable clustering by utilizing the advantages of both approaches. To compare the efficiency of the proposed methods to conventional clustering methods and stand-alone autoencoders, we conduct comprehensive experiments on 13 real-life datasets. The outcomes demonstrate the approach’s intriguing potential for addressing complicated clustering problems, and importantly, effectiveness is demonstrated even under noisy conditions.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"140 ","pages":"Article 109562"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring structural components in autoencoder-based data clustering\",\"authors\":\"Sujoy Chatterjee , Suvra Jyoti Choudhury\",\"doi\":\"10.1016/j.engappai.2024.109562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Clustering is a fundamental machine-learning task that has received extensive popularity in the literature. The foundational tenet of traditional clustering approaches is that data are learned to be vectorized features through various representational learning techniques. The conventional clustering methods can no longer manage the high-dimensional data as the data gets more intricate. Numerous representational learning strategies using deep architectures have been presented over the years, particularly deep unsupervised learning due to its superiority over conventional approaches. In most existing research, especially in the autoencoder-based approaches, only the distance information of pair-of-points in the original data space is retained in the latent space. However, combining this with additional preserving factors like the variance and independent component in the original data and latent space, respectively, is important. In addition, the model’s stability under noisy conditions is crucial. This paper provides a unique method for clustering data that combines autoencoder (AE), principal component analysis (PCA), and independent component analysis (ICA) to capture a relevant latent space representation. A further aid in lowering the dimensionality to improve clustering effectiveness is employing two-dimensional reduction algorithms, i.e., PCA and <span><math><mrow><mi>t</mi><mo>−</mo></mrow></math></span>distributed stochastic neighbor embedding (<span><math><mrow><mi>t</mi><mo>−</mo></mrow></math></span>SNE). The proposed technique produces more precise and reliable clustering by utilizing the advantages of both approaches. To compare the efficiency of the proposed methods to conventional clustering methods and stand-alone autoencoders, we conduct comprehensive experiments on 13 real-life datasets. The outcomes demonstrate the approach’s intriguing potential for addressing complicated clustering problems, and importantly, effectiveness is demonstrated even under noisy conditions.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"140 \",\"pages\":\"Article 109562\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197624017202\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624017202","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

聚类是一项基本的机器学习任务，在文献中广为流行。传统聚类方法的基本原理是通过各种表征学习技术将数据学习为向量化特征。随着数据变得越来越复杂，传统的聚类方法已无法管理高维数据。多年来，使用深度架构的表征学习策略层出不穷，尤其是深度无监督学习，因为它比传统方法更具优势。在大多数现有研究中，尤其是基于自动编码器的方法中，潜空间中只保留了原始数据空间中点对的距离信息。然而，将其分别与原始数据和潜空间中的方差和独立分量等额外的保留因子结合起来是非常重要的。此外，模型在噪声条件下的稳定性也至关重要。本文提供了一种独特的数据聚类方法，它结合了自动编码器（AE）、主成分分析（PCA）和独立成分分析（ICA）来捕捉相关的潜空间表示。为进一步降低维度以提高聚类效果，还采用了二维缩减算法，即 PCA 和 t 分布随机邻域嵌入（t-SNE）。所提出的技术利用了这两种方法的优势，能产生更精确、更可靠的聚类。为了比较所提方法与传统聚类方法和独立自动编码器的效率，我们在 13 个真实数据集上进行了全面实验。实验结果表明，该方法具有解决复杂聚类问题的巨大潜力，更重要的是，即使在噪声条件下，该方法的有效性也得到了证明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring structural components in autoencoder-based data clustering

Clustering is a fundamental machine-learning task that has received extensive popularity in the literature. The foundational tenet of traditional clustering approaches is that data are learned to be vectorized features through various representational learning techniques. The conventional clustering methods can no longer manage the high-dimensional data as the data gets more intricate. Numerous representational learning strategies using deep architectures have been presented over the years, particularly deep unsupervised learning due to its superiority over conventional approaches. In most existing research, especially in the autoencoder-based approaches, only the distance information of pair-of-points in the original data space is retained in the latent space. However, combining this with additional preserving factors like the variance and independent component in the original data and latent space, respectively, is important. In addition, the model’s stability under noisy conditions is crucial. This paper provides a unique method for clustering data that combines autoencoder (AE), principal component analysis (PCA), and independent component analysis (ICA) to capture a relevant latent space representation. A further aid in lowering the dimensionality to improve clustering effectiveness is employing two-dimensional reduction algorithms, i.e., PCA and

t -

distributed stochastic neighbor embedding (

t -

SNE). The proposed technique produces more precise and reliable clustering by utilizing the advantages of both approaches. To compare the efficiency of the proposed methods to conventional clustering methods and stand-alone autoencoders, we conduct comprehensive experiments on 13 real-life datasets. The outcomes demonstrate the approach’s intriguing potential for addressing complicated clustering problems, and importantly, effectiveness is demonstrated even under noisy conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.