Journal of Classification最新文献

筛选
英文 中文
Inferential Tools for Assessing Dependence Across Response Categories in Multinomial Models with Discrete Random Effects 在具有离散随机效应的多项式模型中评估跨响应类别依赖性的推理工具
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-03-04 DOI: 10.1007/s00357-024-09466-2
{"title":"Inferential Tools for Assessing Dependence Across Response Categories in Multinomial Models with Discrete Random Effects","authors":"","doi":"10.1007/s00357-024-09466-2","DOIUrl":"https://doi.org/10.1007/s00357-024-09466-2","url":null,"abstract":"<h3>Abstract</h3> <p>We propose a discrete random effects multinomial regression model to deal with estimation and inference issues in the case of categorical and hierarchical data. Random effects are assumed to follow a discrete distribution with an a priori unknown number of support points. For a <em>K</em>-categories response, the modelling identifies a latent structure at the highest level of grouping, where groups are clustered into subpopulations. This model does not assume the independence across random effects relative to different response categories, and this provides an improvement from the multinomial semi-parametric multilevel model previously proposed in the literature. Since the category-specific random effects arise from the same subjects, the independence assumption is seldom verified in real data. To evaluate the improvements provided by the proposed model, we reproduce simulation and case studies of the literature, highlighting the strength of the method in properly modelling the real data structure and the advantages that taking into account the data dependence structure offers.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"62 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Forest Fire Risk for Artillery Military Training using Weighted Support Vector Machine for Imbalanced Data 利用加权支持向量机预测炮兵军事训练中的森林火灾风险
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-03-04 DOI: 10.1007/s00357-024-09467-1
Ji Hyun Nam, Jongmin Mun, Seongil Jo, Jaeoh Kim
{"title":"Prediction of Forest Fire Risk for Artillery Military Training using Weighted Support Vector Machine for Imbalanced Data","authors":"Ji Hyun Nam, Jongmin Mun, Seongil Jo, Jaeoh Kim","doi":"10.1007/s00357-024-09467-1","DOIUrl":"https://doi.org/10.1007/s00357-024-09467-1","url":null,"abstract":"<p>Since the 1953 truce, the Republic of Korea Army (ROKA) has regularly conducted artillery training, posing a risk of wildfires — a threat to both the environment and the public perception of national defense. To assess this risk and aid decision-making within the ROKA, we built a predictive model of wildfires triggered by artillery training. To this end, we combined the ROKA dataset with meteorological database. Given the infrequent occurrence of wildfires (imbalance ratio <span>(approx )</span> 1:24 in our dataset), achieving balanced detection of wildfire occurrences and non-occurrences is challenging. Our approach combines a weighted support vector machine with a Gaussian mixture-based oversampling, effectively penalizing misclassification of the wildfires. Applied to our dataset, our method outperforms traditional algorithms (G-mean=0.864, sensitivity=0.956, specificity= 0.781), indicating balanced detection. This study not only helps reduce wildfires during artillery trainings but also provides a practical wildfire prediction method for similar climates worldwide.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"114 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary Peacock Algorithm: A Novel Metaheuristic Approach for Feature Selection 二元孔雀算法:一种用于特征选择的新型元智方法
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-03-04 DOI: 10.1007/s00357-024-09468-0
Hema Banati, Richa Sharma, Asha Yadav
{"title":"Binary Peacock Algorithm: A Novel Metaheuristic Approach for Feature Selection","authors":"Hema Banati, Richa Sharma, Asha Yadav","doi":"10.1007/s00357-024-09468-0","DOIUrl":"https://doi.org/10.1007/s00357-024-09468-0","url":null,"abstract":"<p>Binary metaheuristic algorithms prove to be invaluable for solving binary optimization problems. This paper proposes a binary variant of the peacock algorithm (PA) for feature selection. PA, a recent metaheuristic algorithm, is built upon lekking and mating behaviors of peacocks and peahens. While designing the binary variant, two major shortcomings of PA (lek formation and offspring generation) were identified and addressed. Eight binary variants of PA are also proposed and compared over mean fitness to identify the best variant, called binary peacock algorithm (bPA). To validate bPA’s performance experiments are conducted using 34 benchmark datasets and results are compared with eight well-known binary metaheuristic algorithms. The results show that bPA classifies 30 datasets with highest accuracy and extracts minimum features in 32 datasets, achieving up to 99.80% reduction in the feature subset size in the dataset with maximum features. bPA attained rank 1 in Friedman rank test over all parameters.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"11 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised Classification of High-Dimensional Correlated Data: Application to Genomic Data 高维相关数据的监督分类:基因组数据的应用
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-02-28 DOI: 10.1007/s00357-024-09463-5
Aboubacry Gaye, Abdou Ka Diongue, Seydou Nourou Sylla, Maryam Diarra, Amadou Diallo, Cheikh Talla, Cheikh Loucoubar
{"title":"Supervised Classification of High-Dimensional Correlated Data: Application to Genomic Data","authors":"Aboubacry Gaye, Abdou Ka Diongue, Seydou Nourou Sylla, Maryam Diarra, Amadou Diallo, Cheikh Talla, Cheikh Loucoubar","doi":"10.1007/s00357-024-09463-5","DOIUrl":"https://doi.org/10.1007/s00357-024-09463-5","url":null,"abstract":"<p>This work addresses the problem of supervised classification for high-dimensional and highly correlated data using correlation blocks and supervised dimension reduction. We propose a method that combines block partitioning based on interval graph modeling and an extension of principal component analysis (PCA) incorporating conditional class moment estimates in the low-dimensional projection. Block partitioning allows us to handle the high correlation of our data by grouping them into blocks where the correlation within the same block is maximized and the correlation between variables in different blocks is minimized. The extended PCA allows us to perform low-dimensional projection and clustering supervised. Applied to gene expression data from 445 individuals divided into two groups (diseased and non-diseased) and 719,656 single nucleotide polymorphisms (SNPs), this method shows good clustering and prediction performances. SNPs are a type of genetic variation that represents a difference in a single deoxyribonucleic acid (DNA) building block, namely a nucleotide. Previous research has shown that SNPs can be used to identify the correct population origin of an individual and can act in isolation or simultaneously to impact a phenotype. In this regard, the study of the contribution of genetics in infectious disease phenotypes is crucial. The classical statistical models currently used in the field of genome-wide association studies (GWAS) have shown their limitations in detecting genes of interest in the study of complex diseases such as asthma or malaria. In this study, we first investigate a linkage disequilibrium (LD) block partition method based on interval graph modeling to handle the high correlation between SNPs. Then, we use supervised approaches, in particular, the approach that extends PCA by incorporating conditional class moment estimates in the low-dimensional projection, to identify the determining SNPs in malaria episodes. Experimental results obtained on the Dielmo-Ndiop project dataset show that the linear discriminant analysis (LDA) approach has significantly high accuracy in predicting malaria episodes.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"6 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140011375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Soft Label Guided Unsupervised Discriminative Sparse Subspace Feature Selection 软标签引导的无监督判别稀疏子空间特征选择
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-01-25 DOI: 10.1007/s00357-024-09462-6
Keding Chen, Yong Peng, Feiping Nie, Wanzeng Kong
{"title":"Soft Label Guided Unsupervised Discriminative Sparse Subspace Feature Selection","authors":"Keding Chen, Yong Peng, Feiping Nie, Wanzeng Kong","doi":"10.1007/s00357-024-09462-6","DOIUrl":"https://doi.org/10.1007/s00357-024-09462-6","url":null,"abstract":"<p>Feature selection and subspace learning are two primary methods to achieve data dimensionality reduction and discriminability enhancement. However, data label information is unavailable in unsupervised learning to guide the dimensionality reduction process. To this end, we propose a soft label guided unsupervised discriminative sparse subspace feature selection (UDS<span>(^2)</span>FS) model in this paper, which consists of two superiorities in comparison with the existing studies. On the one hand, UDS<span>(^2)</span>FS aims to find a discriminative subspace to simultaneously maximize the between-class data scatter and minimize the within-class scatter. On the other hand, UDS<span>(^2)</span>FS estimates the data label information in the learned subspace, which further serves as the soft labels to guide the discriminative subspace learning process. Moreover, the <span>(ell _{2,0})</span>-norm is imposed to achieve row sparsity of the subspace projection matrix, which is parameter-free and more stable compared to the <span>(ell _{2,1})</span>-norm. Experimental studies to evaluate the performance of UDS<span>(^2)</span>FS are performed from three aspects, i.e., a synthetic data set to check its iterative optimization process, several toy data sets to visualize the feature selection effect, and some benchmark data sets to examine the clustering performance of UDS<span>(^2)</span>FS. From the obtained results, UDS<span>(^2)</span>FS exhibits competitive performance in joint subspace learning and feature selection in comparison with some related models.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"330 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139559325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable Selection for Hidden Markov Models with Continuous Variables and Missing Data 具有连续变量和缺失数据的隐马尔可夫模型的变量选择
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-01-23 DOI: 10.1007/s00357-023-09457-9
Fulvia Pennoni, Francesco Bartolucci, Silvia Pandofi
{"title":"Variable Selection for Hidden Markov Models with Continuous Variables and Missing Data","authors":"Fulvia Pennoni, Francesco Bartolucci, Silvia Pandofi","doi":"10.1007/s00357-023-09457-9","DOIUrl":"https://doi.org/10.1007/s00357-023-09457-9","url":null,"abstract":"<p>We propose a variable selection method for multivariate hidden Markov models with continuous responses that are partially or completely missing at a given time occasion. Through this procedure, we achieve a dimensionality reduction by selecting the subset of the most informative responses for clustering individuals and simultaneously choosing the optimal number of these clusters corresponding to latent states. The approach is based on comparing different model specifications in terms of the subset of responses assumed to be dependent on the latent states, and it relies on a greedy search algorithm based on the Bayesian information criterion seen as an approximation of the Bayes factor. A suitable expectation-maximization algorithm is employed to obtain maximum likelihood estimates of the model parameters under the missing-at-random assumption. The proposal is illustrated via Monte Carlo simulation and an application where development indicators collected over eighteen years are selected, and countries are clustered into groups to evaluate their growth over time.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"56 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139559134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Cognitive Diagnosis When Attributes Are Polytomous 属性多态时的非参数认知诊断
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-01-11 DOI: 10.1007/s00357-023-09461-z
Youn Seon Lim
{"title":"Nonparametric Cognitive Diagnosis When Attributes Are Polytomous","authors":"Youn Seon Lim","doi":"10.1007/s00357-023-09461-z","DOIUrl":"https://doi.org/10.1007/s00357-023-09461-z","url":null,"abstract":"<p>Cognitive diagnosis models provide diagnostic information on whether examinees have mastered the skills, called “attributes,” that characterize a given knowledge domain. Based on attribute mastery, distinct proficiency classes are defined to which examinees are assigned based on their item responses. Attributes are typically perceived as binary. However, polytomous attributes may yield higher precision in the assessment of examinees’ attribute mastery. Karelitz (2004) introduced the ordered-category attribute coding framework (OCAC) to accommodate polytomous attributes. Other approaches to handle polytomous attributes in cognitive diagnosis have been proposed in the literature. However, the heavy parameterization of these models often created difficulties in fitting these models. In this article, a nonparametric method for cognitive diagnosis is proposed for use with polytomous attributes, called the nonparametric polytomous attributes diagnostic classification (NPADC) method, that relies on an adaptation of the OCAC framework. The new NPADC method proposed here can be used with various cognitive diagnosis models. It does not require large sample sizes; it is computationally efficient and highly effective as is evidenced by the recovery rates of the proficiency classes observed in large-scale simulation studies. The NPADC method is also used with a real-world data set.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"209 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parsimonious Seemingly Unrelated Contaminated Normal Cluster-Weighted Models 看似不相关的似污染正态聚类加权模型
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-01-08 DOI: 10.1007/s00357-023-09458-8
{"title":"Parsimonious Seemingly Unrelated Contaminated Normal Cluster-Weighted Models","authors":"","doi":"10.1007/s00357-023-09458-8","DOIUrl":"https://doi.org/10.1007/s00357-023-09458-8","url":null,"abstract":"<h3>Abstract</h3> <p>Normal cluster-weighted models constitute a modern approach to linear regression which simultaneously perform model-based cluster analysis and multivariate linear regression analysis with random quantitative regressors. Robustified models have been recently developed, based on the use of the contaminated normal distribution, which can manage the presence of mildly atypical observations. A more flexible class of contaminated normal linear cluster-weighted models is specified here, in which the researcher is free to use a different vector of regressors for each response. The novel class also includes parsimonious models, where parsimony is attained by imposing suitable constraints on the component-covariance matrices of either the responses or the regressors. Identifiability conditions are illustrated and discussed. An expectation-conditional maximisation algorithm is provided for the maximum likelihood estimation of the model parameters. The effectiveness and usefulness of the proposed models are shown through the analysis of simulated and real datasets.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"37 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139411994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures 使用准污染移位非对称拉普拉斯混合物族进行无监督分类
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2024-01-06 DOI: 10.1007/s00357-023-09460-0
{"title":"Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures","authors":"","doi":"10.1007/s00357-023-09460-0","DOIUrl":"https://doi.org/10.1007/s00357-023-09460-0","url":null,"abstract":"<h3>Abstract</h3> <p>A family of parsimonious contaminated shifted asymmetric Laplace mixtures is developed for unsupervised classification of asymmetric clusters in the presence of outliers and noise. A series of constraints are applied to a modified factor analyzer structure of the component scale matrices, yielding a family of twelve models. Application of the modified factor analyzer structure and these parsimonious constraints makes these models effective for the analysis of high-dimensional data by reducing the number of free parameters that need to be estimated. A variant of the expectation-maximization algorithm is developed for parameter estimation with convergence issues being discussed and addressed. Popular model selection criteria like the Bayesian information criterion and the integrated complete likelihood (ICL) are utilized, and a novel modification to the ICL is also considered. Through a series of simulation studies and real data analyses, that includes comparisons to well-established methods, we demonstrate the improvements in classification performance found using the proposed family of models.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"20 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
funLOCI: A Local Clustering Algorithm for Functional Data funLOCI:功能数据的局部聚类算法
IF 2 4区 计算机科学
Journal of Classification Pub Date : 2023-12-07 DOI: 10.1007/s00357-023-09456-w
Jacopo Di Iorio, Simone Vantini
{"title":"funLOCI: A Local Clustering Algorithm for Functional Data","authors":"Jacopo Di Iorio, Simone Vantini","doi":"10.1007/s00357-023-09456-w","DOIUrl":"https://doi.org/10.1007/s00357-023-09456-w","url":null,"abstract":"<p>Nowadays, an increasing number of problems involve data with one infinite continuous dimension known as functional data. In this paper, we introduce the <i>funLOCI</i> algorithm, which enables the identification of functional local clusters or functional loci, i.e, subsets or groups of curves that exhibit similar behavior across the same continuous subset of the domain. The definition of functional local clusters incorporates ideas from multivariate and functional clustering and biclustering and is based on an additive model that takes into account the shape of the curves. <i>funLOCI</i> is a multi-step algorithm that relies on hierarchical clustering and a functional version of the mean squared residue score to identify and validate candidate loci. Subsequently, all the results are collected and ordered in a post-processing step. To evaluate our algorithm performance, we conduct extensive simulations and compare it with other recently proposed algorithms in the literature. Furthermore, we apply <i>funLOCI</i> to a real-data case regarding inner carotid arteries.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"46 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138547167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信