Journal of Classification最新文献_第2页

Clustering Longitudinal Data for Growth Curve Modelling by Gibbs Sampler and Information Criterion 利用吉布斯采样器和信息标准对纵向数据进行聚类以建立生长曲线模型

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-06-19 DOI: 10.1007/s00357-024-09477-z

Yu Fei, Rongli Li, Zhouhong Li, Guoqi Qian

{"title":"Clustering Longitudinal Data for Growth Curve Modelling by Gibbs Sampler and Information Criterion","authors":"Yu Fei, Rongli Li, Zhouhong Li, Guoqi Qian","doi":"10.1007/s00357-024-09477-z","DOIUrl":"https://doi.org/10.1007/s00357-024-09477-z","url":null,"abstract":"Clustering longitudinal data for growth curve modelling is considered in this paper, where we aim to optimally estimate the underpinning unknown group partition matrix. Instead of following the conventional soft clustering approach, which assumes the columns of the partition matrix to have i.i.d. multinomial or categorical prior distributions and uses a regression model with the response following a finite mixture distribution to estimate the posterior distribution of the partition matrix, we propose an iterative partition and regression procedure to find the best partition matrix and the associated best growth curve regression model for each identified cluster. We show that the best partition matrix is the one minimizing a recently developed empirical Bayes information criterion (eBIC), which, due to the involved combinatorial explosion, is difficult to compute via enumerating all candidate partition matrices. Thus, we develop a Gibbs sampling method to generate a Markov chain of candidate partition matrices that has its equilibrium probability distribution equal the one induced from eBIC. We further show that the best partition matrix, given a priori the number of latent clusters, can be consistently estimated and is computationally scalable based on this Markov chain. The number of latent clusters is also best estimated by minimizing eBIC. The proposed iterative clustering and regression method is assessed by a comprehensive simulation study before being applied to two real-world growth curve modelling examples involving longitudinal data clustering.","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"199 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finding Outliers in Gaussian Model-based Clustering 在基于高斯模型的聚类中查找异常值

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-05-30 DOI: 10.1007/s00357-024-09473-3

Katharine M. Clark, Paul D. McNicholas

引用次数: 0

A Novel Classification Algorithm Based on the Synergy Between Dynamic Clustering with Adaptive Distances and K-Nearest Neighbors 基于自适应距离动态聚类与 K 近邻协同作用的新型分类算法

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-05-11 DOI: 10.1007/s00357-024-09471-5

Mohammed Sabri, Rosanna Verde, Antonio Balzanella, Fabrizio Maturo, Hamid Tairi, Ali Yahyaouy, Jamal Riffi

{"title":"A Novel Classification Algorithm Based on the Synergy Between Dynamic Clustering with Adaptive Distances and K-Nearest Neighbors","authors":"Mohammed Sabri, Rosanna Verde, Antonio Balzanella, Fabrizio Maturo, Hamid Tairi, Ali Yahyaouy, Jamal Riffi","doi":"10.1007/s00357-024-09471-5","DOIUrl":"https://doi.org/10.1007/s00357-024-09471-5","url":null,"abstract":"This paper introduces a novel supervised classification method based on dynamic clustering (DC) and K-nearest neighbor (KNN) learning algorithms, denoted DC-KNN. The aim is to improve the accuracy of a classifier by using a DC method to discover the hidden patterns of the apriori groups of the training set. It provides a partitioning of each group into a predetermined number of subgroups. A new objective function is designed for the DC variant, based on a trade-off between the compactness and separation of all subgroups in the original groups. Moreover, the proposed DC method uses adaptive distances which assign a set of weights to the variables of each cluster, which depend on both their intra-cluster and inter-cluster structure. DC-KNN performs the minimization of a suitable objective function. Next, the KNN algorithm takes into account objects by assigning them to the label of subgroups. Furthermore, the classification step is performed according to two KNN competing algorithms. The proposed strategies have been evaluated using both synthetic data and widely used real datasets from public repositories. The achieved results have confirmed the effectiveness and robustness of the strategy in improving classification accuracy in comparison to alternative approaches.","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"94 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerated Sequential Data Clustering 加速序列数据聚类

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-05-09 DOI: 10.1007/s00357-024-09472-4

Reza Mortazavi, Elham Enayati, Abdolali Basiri

引用次数: 0

Skew Multiple Scaled Mixtures of Normal Distributions with Flexible Tail Behavior and Their Application to Clustering 具有灵活尾部行为的偏斜多重标度正态分布混合物及其在聚类中的应用

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-05-06 DOI: 10.1007/s00357-024-09470-6

Abbas Mahdavi, Anthony F. Desmond, Ahad Jamalizadeh, Tsung-I Lin

引用次数: 0

Multinomial Restricted Unfolding 多项式受限展开

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-04-08 DOI: 10.1007/s00357-024-09465-3

Mark de Rooij, Frank Busing

{"title":"Multinomial Restricted Unfolding","authors":"Mark de Rooij, Frank Busing","doi":"10.1007/s00357-024-09465-3","DOIUrl":"https://doi.org/10.1007/s00357-024-09465-3","url":null,"abstract":"For supervised classification we propose to use restricted multidimensional unfolding in a multinomial logistic framework. Where previous research proposed similar models based on squared distances, we propose to use usual (i.e., not squared) Euclidean distances. This change in functional form results in several interpretational advantages of the resulting biplot, a graphical representation of the classification model. First, the conditional probability of any class peaks at the location of the class in the Euclidean space. Second, the interpretation of the biplot is in terms of distances towards the class points, whereas in the squared distance model the interpretation is in terms of the distance towards the decision boundary. Third, the distance between two class points represents an upper bound for the estimated log-odds of choosing one of these classes over the other. For our multinomial restricted unfolding, we develop and test a Majorization Minimization algorithm that monotonically decreases the negative log-likelihood. With two empirical applications we point out the advantages of the distance model and show how to apply multinomial restricted unfolding in practice, including model selection.","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"40 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140589402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inferential Tools for Assessing Dependence Across Response Categories in Multinomial Models with Discrete Random Effects 在具有离散随机效应的多项式模型中评估跨响应类别依赖性的推理工具

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-03-04 DOI: 10.1007/s00357-024-09466-2

引用次数: 0

Binary Peacock Algorithm: A Novel Metaheuristic Approach for Feature Selection 二元孔雀算法：一种用于特征选择的新型元智方法

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-03-04 DOI: 10.1007/s00357-024-09468-0

Hema Banati, Richa Sharma, Asha Yadav

引用次数: 0

Supervised Classification of High-Dimensional Correlated Data: Application to Genomic Data 高维相关数据的监督分类：基因组数据的应用

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-02-28 DOI: 10.1007/s00357-024-09463-5

Aboubacry Gaye, Abdou Ka Diongue, Seydou Nourou Sylla, Maryam Diarra, Amadou Diallo, Cheikh Talla, Cheikh Loucoubar

{"title":"Supervised Classification of High-Dimensional Correlated Data: Application to Genomic Data","authors":"Aboubacry Gaye, Abdou Ka Diongue, Seydou Nourou Sylla, Maryam Diarra, Amadou Diallo, Cheikh Talla, Cheikh Loucoubar","doi":"10.1007/s00357-024-09463-5","DOIUrl":"https://doi.org/10.1007/s00357-024-09463-5","url":null,"abstract":"This work addresses the problem of supervised classification for high-dimensional and highly correlated data using correlation blocks and supervised dimension reduction. We propose a method that combines block partitioning based on interval graph modeling and an extension of principal component analysis (PCA) incorporating conditional class moment estimates in the low-dimensional projection. Block partitioning allows us to handle the high correlation of our data by grouping them into blocks where the correlation within the same block is maximized and the correlation between variables in different blocks is minimized. The extended PCA allows us to perform low-dimensional projection and clustering supervised. Applied to gene expression data from 445 individuals divided into two groups (diseased and non-diseased) and 719,656 single nucleotide polymorphisms (SNPs), this method shows good clustering and prediction performances. SNPs are a type of genetic variation that represents a difference in a single deoxyribonucleic acid (DNA) building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct population origin of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is crucial. The classical statistical models currently used in the field of genome-wide association studies (GWAS) have shown their limitations in detecting genes of interest in the study of complex diseases such as asthma or malaria. In this study, we first investigate a linkage disequilibrium (LD) block partition method based on interval graph modeling to handle the high correlation between SNPs. Then, we use supervised approaches, in particular, the approach that extends PCA by incorporating conditional class moment estimates in the low-dimensional projection, to identify the determining SNPs in malaria episodes. Experimental results obtained on the Dielmo-Ndiop project dataset show that the linear discriminant analysis (LDA) approach has significantly high accuracy in predicting malaria episodes.","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"6 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140011375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Soft Label Guided Unsupervised Discriminative Sparse Subspace Feature Selection 软标签引导的无监督判别稀疏子空间特征选择

IF 2 4区计算机科学

Journal of Classification Pub Date : 2024-01-25 DOI: 10.1007/s00357-024-09462-6

Keding Chen, Yong Peng, Feiping Nie, Wanzeng Kong

{"title":"Soft Label Guided Unsupervised Discriminative Sparse Subspace Feature Selection","authors":"Keding Chen, Yong Peng, Feiping Nie, Wanzeng Kong","doi":"10.1007/s00357-024-09462-6","DOIUrl":"https://doi.org/10.1007/s00357-024-09462-6","url":null,"abstract":"Feature selection and subspace learning are two primary methods to achieve data dimensionality reduction and discriminability enhancement. However, data label information is unavailable in unsupervised learning to guide the dimensionality reduction process. To this end, we propose a soft label guided unsupervised discriminative sparse subspace feature selection (UDS(^2)FS) model in this paper, which consists of two superiorities in comparison with the existing studies. On the one hand, UDS(^2)FS aims to find a discriminative subspace to simultaneously maximize the between-class data scatter and minimize the within-class scatter. On the other hand, UDS(^2)FS estimates the data label information in the learned subspace, which further serves as the soft labels to guide the discriminative subspace learning process. Moreover, the (ell _{2,0})-norm is imposed to achieve row sparsity of the subspace projection matrix, which is parameter-free and more stable compared to the (ell _{2,1})-norm. Experimental studies to evaluate the performance of UDS(^2)FS are performed from three aspects, i.e., a synthetic data set to check its iterative optimization process, several toy data sets to visualize the feature selection effect, and some benchmark data sets to examine the clustering performance of UDS(^2)FS. From the obtained results, UDS(^2)FS exhibits competitive performance in joint subspace learning and feature selection in comparison with some related models.","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"330 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139559325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0