Diversity-induced fuzzy clustering with Laplacian regularization

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-04-24 DOI:10.1016/j.ins.2025.122225

Yunlong Gao , Qinting Wu , Zhenghong Xu , Jinyan Pan , Guifang Shao , Qingyuan Zhu , Feiping Nie

{"title":"Diversity-induced fuzzy clustering with Laplacian regularization","authors":"Yunlong Gao , Qinting Wu , Zhenghong Xu , Jinyan Pan , Guifang Shao , Qingyuan Zhu , Feiping Nie","doi":"10.1016/j.ins.2025.122225","DOIUrl":null,"url":null,"abstract":"<div><div>Fuzzy clustering is a fundamental technique in unsupervised learning for exploring data structures. However, fuzzy c-means (FCM), as a representative fuzzy clustering algorithm, performs relatively poorly when handling noisy data and outliers since it only considers global data characteristics while ignoring the local information. Additionally, FCM overlooks data diversity, making it difficult to handle complex data and leading to cluster center overlapping. To address these challenges, this paper proposes a novel approach called diversity-induced fuzzy clustering with Laplacian regularization (DiFCMLR). DiFCMLR incorporates Hilbert-Schmidt Independence Criterion (HSIC) to maximize the independence among clusters, thereby enhancing clustering diversity. In addition, DiFCMLR introduces Laplacian regularization to consider the local information of data and determine the affinity relationship between samples. Furthermore, it corrects the Euclidean distance between samples, thereby reducing the impact of the normal distribution prior assumption of FCM and improving the applicability of algorithm to complex data or size-imbalance problems. During the optimization, DiFCMLR utilizes iterative reweighting and the alternating direction method of multipliers, which enhance robustness against noise and outliers and achieve faster convergence towards better solutions. The effectiveness of DiFCMLR is confirmed through theoretical analysis and experimental evaluation.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122225"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003573","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Fuzzy clustering is a fundamental technique in unsupervised learning for exploring data structures. However, fuzzy c-means (FCM), as a representative fuzzy clustering algorithm, performs relatively poorly when handling noisy data and outliers since it only considers global data characteristics while ignoring the local information. Additionally, FCM overlooks data diversity, making it difficult to handle complex data and leading to cluster center overlapping. To address these challenges, this paper proposes a novel approach called diversity-induced fuzzy clustering with Laplacian regularization (DiFCMLR). DiFCMLR incorporates Hilbert-Schmidt Independence Criterion (HSIC) to maximize the independence among clusters, thereby enhancing clustering diversity. In addition, DiFCMLR introduces Laplacian regularization to consider the local information of data and determine the affinity relationship between samples. Furthermore, it corrects the Euclidean distance between samples, thereby reducing the impact of the normal distribution prior assumption of FCM and improving the applicability of algorithm to complex data or size-imbalance problems. During the optimization, DiFCMLR utilizes iterative reweighting and the alternating direction method of multipliers, which enhance robustness against noise and outliers and achieve faster convergence towards better solutions. The effectiveness of DiFCMLR is confirmed through theoretical analysis and experimental evaluation.

查看原文本刊更多论文

基于拉普拉斯正则化的多样性诱导模糊聚类

模糊聚类是一种用于探索数据结构的无监督学习的基本技术。然而，模糊c-means （FCM）作为模糊聚类算法的代表，在处理噪声数据和离群值时表现相对较差，因为它只考虑数据的全局特征，而忽略了局部信息。此外，FCM忽略了数据的多样性，使得处理复杂数据变得困难，导致簇中心重叠。为了解决这些问题，本文提出了一种新的方法，称为多样性诱导模糊聚类与拉普拉斯正则化（DiFCMLR）。DiFCMLR采用了Hilbert-Schmidt独立准则（HSIC）来最大化聚类之间的独立性，从而增强了聚类的多样性。此外，DiFCMLR引入拉普拉斯正则化，考虑数据的局部信息，确定样本间的亲和关系。此外，修正了样本间的欧几里得距离，从而减少了FCM正态分布先验假设的影响，提高了算法对复杂数据或规模不平衡问题的适用性。在优化过程中，DiFCMLR采用迭代重权法和乘法器交替方向法，增强了对噪声和离群值的鲁棒性，更快收敛到更好的解。通过理论分析和实验评价，验证了DiFCMLR的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.