基于半监督投影模型聚类的多元有界支持Kotz混合模型

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tsega Weldu Araya , Muhammad Azam , Nizar Bouguila , Jamal Bentahar
{"title":"基于半监督投影模型聚类的多元有界支持Kotz混合模型","authors":"Tsega Weldu Araya ,&nbsp;Muhammad Azam ,&nbsp;Nizar Bouguila ,&nbsp;Jamal Bentahar","doi":"10.1016/j.inffus.2025.103330","DOIUrl":null,"url":null,"abstract":"<div><div>Data clustering is a crucial technique in data analysis, aimed at identifying and grouping similar data points to uncover underlying structures within a dataset. We propose a new unsupervised clustering approach using a multivariate bounded Kotz mixture model (BKMM) for data modeling when the data lie within a bounded support region. In many real applications, BKMM effectively handles observed data that fall within these limits, accurately modeling and clustering the observations. In BKMM, parameter estimation is performed by maximizing the log-likelihood using Expectation–Maximization (EM) algorithm and the Newton–Raphson method. Additionally, we explore the enhancements in clustering performance through semi-supervised learning by incorporating a small amount of labeled data to guide the clustering process. Thus, we propose a bounded Kotz mixture model using a semi-supervised projected model-based clustering method (BKMM-SeSProC) to obtain hidden cluster labels. Model selection in mixtures is essential for determining the optimal number of mixture components, and we introduce a minimum message length (MML) model selection criterion to find the best number of clusters in the BKMM-SeSProC approach. A greedy forward search is applied to estimate the optimal number of clusters. We use the same datasets to evaluate our proposed models, BKMM and BKMM-SeSProC, for data clustering. Additionally, we utilize MML model selection with BKMM-SeSProC to determine the number of components. Initially, we validate both proposed models and the model selection process in various medical applications. Furthermore, to assess their broader performance, we test the models on image datasets, including Alzheimer’s disease, lung tissue, and gastrointestinal tract images for disease recognition, and the CIFAR-100 dataset for object categorization. BKMM is compared with the Kotz mixture model (KMM), Student’s t mixture model (SMM), Laplace mixture model (LMM), bounded Gaussian mixture model (BGMM), and Gaussian mixture model (GMM) under similar experimental settings across all datasets. To evaluate the performance of BKMM and BKMM-SeSProC, several performance metrics are employed. To find the best number of clusters for BKMM-SeSProC, we examine the effectiveness of MML model selection against seven different criteria. The experimental results demonstrate that the proposed BKMM outperforms the compared models, KMM, SMM, LMM, BGMM, and GMM, in all applications. Additionally, the semi-supervised projected model-based clustering shows better performance across all evaluation metrics compared to unsupervised BKMM.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103330"},"PeriodicalIF":14.7000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multivariate bounded support Kotz mixture model with semi-supervised projected model-based clustering\",\"authors\":\"Tsega Weldu Araya ,&nbsp;Muhammad Azam ,&nbsp;Nizar Bouguila ,&nbsp;Jamal Bentahar\",\"doi\":\"10.1016/j.inffus.2025.103330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Data clustering is a crucial technique in data analysis, aimed at identifying and grouping similar data points to uncover underlying structures within a dataset. We propose a new unsupervised clustering approach using a multivariate bounded Kotz mixture model (BKMM) for data modeling when the data lie within a bounded support region. In many real applications, BKMM effectively handles observed data that fall within these limits, accurately modeling and clustering the observations. In BKMM, parameter estimation is performed by maximizing the log-likelihood using Expectation–Maximization (EM) algorithm and the Newton–Raphson method. Additionally, we explore the enhancements in clustering performance through semi-supervised learning by incorporating a small amount of labeled data to guide the clustering process. Thus, we propose a bounded Kotz mixture model using a semi-supervised projected model-based clustering method (BKMM-SeSProC) to obtain hidden cluster labels. Model selection in mixtures is essential for determining the optimal number of mixture components, and we introduce a minimum message length (MML) model selection criterion to find the best number of clusters in the BKMM-SeSProC approach. A greedy forward search is applied to estimate the optimal number of clusters. We use the same datasets to evaluate our proposed models, BKMM and BKMM-SeSProC, for data clustering. Additionally, we utilize MML model selection with BKMM-SeSProC to determine the number of components. Initially, we validate both proposed models and the model selection process in various medical applications. Furthermore, to assess their broader performance, we test the models on image datasets, including Alzheimer’s disease, lung tissue, and gastrointestinal tract images for disease recognition, and the CIFAR-100 dataset for object categorization. BKMM is compared with the Kotz mixture model (KMM), Student’s t mixture model (SMM), Laplace mixture model (LMM), bounded Gaussian mixture model (BGMM), and Gaussian mixture model (GMM) under similar experimental settings across all datasets. To evaluate the performance of BKMM and BKMM-SeSProC, several performance metrics are employed. To find the best number of clusters for BKMM-SeSProC, we examine the effectiveness of MML model selection against seven different criteria. The experimental results demonstrate that the proposed BKMM outperforms the compared models, KMM, SMM, LMM, BGMM, and GMM, in all applications. Additionally, the semi-supervised projected model-based clustering shows better performance across all evaluation metrics compared to unsupervised BKMM.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"124 \",\"pages\":\"Article 103330\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525004038\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004038","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

数据聚类是数据分析中的一项关键技术,旨在识别和分组相似的数据点,以揭示数据集中的底层结构。我们提出了一种新的无监督聚类方法,当数据位于有界支持区域内时,使用多元有界Kotz混合模型(BKMM)进行数据建模。在许多实际应用中,BKMM有效地处理在这些限制范围内的观测数据,准确地对观测进行建模和聚类。在BKMM中,参数估计采用期望最大化(EM)算法和Newton-Raphson方法通过最大化对数似然来实现。此外,我们还通过引入少量标记数据来指导聚类过程,探索了通过半监督学习来增强聚类性能的方法。因此,我们提出了一种基于半监督投影模型的聚类方法(BKMM-SeSProC)的有界Kotz混合模型,以获得隐藏的聚类标签。在BKMM-SeSProC方法中,引入最小消息长度(minimum message length, MML)模型选择准则来寻找最佳簇数。采用贪婪正向搜索估计最优簇数。我们使用相同的数据集来评估我们提出的模型BKMM和BKMM- sesproc,用于数据聚类。此外,我们利用BKMM-SeSProC的MML模型选择来确定组件的数量。首先,我们在各种医学应用中验证了所提出的模型和模型选择过程。此外,为了评估其更广泛的性能,我们在图像数据集上测试了这些模型,包括用于疾病识别的阿尔茨海默病、肺组织和胃肠道图像,以及用于对象分类的CIFAR-100数据集。在所有数据集的相似实验设置下,将BKMM与Kotz混合模型(KMM)、Student’s t混合模型(SMM)、Laplace混合模型(LMM)、有界高斯混合模型(BGMM)和高斯混合模型(GMM)进行比较。为了评估BKMM和BKMM- sesproc的性能,采用了几个性能指标。为了找到BKMM-SeSProC的最佳聚类数量,我们根据七个不同的标准检查了MML模型选择的有效性。实验结果表明,所提出的BKMM模型在所有应用中都优于KMM、SMM、LMM、BGMM和GMM模型。此外,与无监督的BKMM相比,半监督的基于投影模型的聚类在所有评估指标上都表现出更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multivariate bounded support Kotz mixture model with semi-supervised projected model-based clustering
Data clustering is a crucial technique in data analysis, aimed at identifying and grouping similar data points to uncover underlying structures within a dataset. We propose a new unsupervised clustering approach using a multivariate bounded Kotz mixture model (BKMM) for data modeling when the data lie within a bounded support region. In many real applications, BKMM effectively handles observed data that fall within these limits, accurately modeling and clustering the observations. In BKMM, parameter estimation is performed by maximizing the log-likelihood using Expectation–Maximization (EM) algorithm and the Newton–Raphson method. Additionally, we explore the enhancements in clustering performance through semi-supervised learning by incorporating a small amount of labeled data to guide the clustering process. Thus, we propose a bounded Kotz mixture model using a semi-supervised projected model-based clustering method (BKMM-SeSProC) to obtain hidden cluster labels. Model selection in mixtures is essential for determining the optimal number of mixture components, and we introduce a minimum message length (MML) model selection criterion to find the best number of clusters in the BKMM-SeSProC approach. A greedy forward search is applied to estimate the optimal number of clusters. We use the same datasets to evaluate our proposed models, BKMM and BKMM-SeSProC, for data clustering. Additionally, we utilize MML model selection with BKMM-SeSProC to determine the number of components. Initially, we validate both proposed models and the model selection process in various medical applications. Furthermore, to assess their broader performance, we test the models on image datasets, including Alzheimer’s disease, lung tissue, and gastrointestinal tract images for disease recognition, and the CIFAR-100 dataset for object categorization. BKMM is compared with the Kotz mixture model (KMM), Student’s t mixture model (SMM), Laplace mixture model (LMM), bounded Gaussian mixture model (BGMM), and Gaussian mixture model (GMM) under similar experimental settings across all datasets. To evaluate the performance of BKMM and BKMM-SeSProC, several performance metrics are employed. To find the best number of clusters for BKMM-SeSProC, we examine the effectiveness of MML model selection against seven different criteria. The experimental results demonstrate that the proposed BKMM outperforms the compared models, KMM, SMM, LMM, BGMM, and GMM, in all applications. Additionally, the semi-supervised projected model-based clustering shows better performance across all evaluation metrics compared to unsupervised BKMM.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信