Advances in Data Analysis and Classification最新文献_第4页

Mixtures of regressions using matrix-variate heavy-tailed distributions 使用矩阵变量重尾分布的回归混合物

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-03-16 DOI: 10.1007/s11634-024-00585-7

{"title":"Mixtures of regressions using matrix-variate heavy-tailed distributions","authors":"","doi":"10.1007/s11634-024-00585-7","DOIUrl":"https://doi.org/10.1007/s11634-024-00585-7","url":null,"abstract":"<h3>Abstract</h3> Finite mixtures of regressions (FMRs) are powerful clustering devices used in many regression-type analyses. Unfortunately, real data often present atypical observations that make the commonly adopted normality assumption of the mixture components inadequate. Thus, to robustify the FMR approach in a matrix-variate framework, we introduce ten FMRs based on the matrix-variate t and contaminated normal distributions. Furthermore, once one of our models is estimated and the observations are assigned to the groups, different procedures can be used for the detection of the atypical points in the data. An ECM algorithm is outlined for maximum likelihood parameter estimation. By using simulated data, we show the negative consequences (in terms of parameter estimates and inferred classification) of the wrong normality assumption in the presence of heavy-tailed clusters or noisy matrices. Such issues are properly addressed by our models instead. Additionally, over the same data, the atypical points detection procedures are also investigated. A real-data analysis concerning the relationship between greenhouse gas emissions and their determinants is conducted, and the behavior of our models in the presence of heterogeneity and atypical observations is discussed.","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"1 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering by deep latent position model with graph convolutional network 利用图卷积网络的深度潜位置模型进行聚类

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-03-12 DOI: 10.1007/s11634-024-00583-9

Dingge Liang, Marco Corneli, Charles Bouveyron, Pierre Latouche

{"title":"Clustering by deep latent position model with graph convolutional network","authors":"Dingge Liang, Marco Corneli, Charles Bouveyron, Pierre Latouche","doi":"10.1007/s11634-024-00583-9","DOIUrl":"10.1007/s11634-024-00583-9","url":null,"abstract":"<div>With the significant increase of interactions between individuals through numeric means, clustering of nodes in graphs has become a fundamental approach for analyzing large and complex networks. In this work, we propose the deep latent position model (DeepLPM), an end-to-end generative clustering approach which combines the widely used latent position model (LPM) for network analysis with a graph convolutional network encoding strategy. Moreover, an original estimation algorithm is introduced to integrate the explicit optimization of the posterior clustering probabilities via variational inference and the implicit optimization using stochastic gradient descent for graph reconstruction. Numerical experiments on simulated scenarios highlight the ability of DeepLPM to self-penalize the evidence lower bound for selecting the number of clusters, demonstrating its clustering capabilities compared to state-of-the-art methods. Finally, DeepLPM is further applied to an ecclesiastical network in Merovingian Gaul and to a citation network Cora to illustrate the practical interest in exploring large and complex real-world networks.</div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 1","pages":"237 - 270"},"PeriodicalIF":1.4,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140126978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion 通过新型分层贝叶斯信息准则选择不完整数据因子分析中的因子数量

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-03-07 DOI: 10.1007/s11634-024-00582-w

Jianhua Zhao, Changchun Shang, Shulan Li, Ling Xin, Philip L. H. Yu

{"title":"Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion","authors":"Jianhua Zhao, Changchun Shang, Shulan Li, Ling Xin, Philip L. H. Yu","doi":"10.1007/s11634-024-00582-w","DOIUrl":"10.1007/s11634-024-00582-w","url":null,"abstract":"<div>The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size N, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the ‘complete’ sample size N is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only (N_i<N) observations for variable i, which means that using the ‘complete’ sample size N implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel hierarchical BIC (HBIC) criterion is proposed for factor analysis with incomplete data, which is denoted by HBICinc. The novelty is that HBICinc only uses the actual amounts of observed information, namely (N_i)’s, in the penalty term. Theoretically, it is shown that HBICinc is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBICinc, which means that HBICinc shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBICinc, BIC, and related criteria with various missing rates. The results show that HBICinc and BIC perform similarly when the missing rate is small, but HBICinc is more accurate when the missing rate is not small.\u0000</div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 1","pages":"209 - 235"},"PeriodicalIF":1.4,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-024-00582-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements 基于预期一致指数无偏估计器的各种卡帕系数估计器

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-03-06 DOI: 10.1007/s11634-024-00581-x

A. Martín Andrés, M. Álvarez Hernández

{"title":"Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements","authors":"A. Martín Andrés, M. Álvarez Hernández","doi":"10.1007/s11634-024-00581-x","DOIUrl":"10.1007/s11634-024-00581-x","url":null,"abstract":"<div>To measure the degree of agreement between R observers who independently classify n subjects within K categories, various kappa-type coefficients are often used. When R = 2, it is common to use the Cohen' kappa, Scott's pi, Gwet’s AC1/2, and Krippendorf's alpha coefficients (weighted or not). When R > 2, some pairwise version based on the aforementioned coefficients is normally used; with the same order as above: Hubert's kappa, Fleiss's kappa, Gwet's AC1/2, and Krippendorf's alpha. However, all these statistics are based on biased estimators of the expected index of agreements, since they estimate the product of two population proportions through the product of their sample estimators. The aims of this article are three. First, to provide statistics based on unbiased estimators of the expected index of agreements and determine their variance based on the variance of the original statistic. Second, to make pairwise extensions of some measures. And third, to show that the old and new estimators of the Cohen’s kappa and Hubert’s kappa coefficients match the well-known estimators of concordance and intraclass correlation coefficients, if the former are defined by assuming quadratic weights. The article shows that the new estimators are always greater than or equal the classic ones, except for the case of Gwet where it is the other way around, although these differences are only relevant with small sample sizes (e.g. n ≤ 30).</div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 1","pages":"177 - 207"},"PeriodicalIF":1.4,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-024-00581-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Special issue on “advances in models and learning for clustering and classification” "聚类和分类模型与学习的进展 "特刊

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-02-27 DOI: 10.1007/s11634-024-00584-8

Luis-Angel García-Escudero, Salvatore Ingrassia, T. Brendan Murphy

引用次数: 0

Spatial quantile clustering of climate data 气候数据的空间量化聚类

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-02-22 DOI: 10.1007/s11634-024-00580-y

Carlo Gaetan, Paolo Girardi, Victor Muthama Musau

引用次数: 0

Robust functional logistic regression 稳健功能逻辑回归

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-02-12 DOI: 10.1007/s11634-023-00577-z

Berkay Akturk, Ufuk Beyaztas, Han Lin Shang, Abhijit Mandal

{"title":"Robust functional logistic regression","authors":"Berkay Akturk, Ufuk Beyaztas, Han Lin Shang, Abhijit Mandal","doi":"10.1007/s11634-023-00577-z","DOIUrl":"10.1007/s11634-023-00577-z","url":null,"abstract":"<div>Functional logistic regression is a popular model to capture a linear relationship between binary response and functional predictor variables. However, many methods used for parameter estimation in functional logistic regression are sensitive to outliers, which may lead to inaccurate parameter estimates and inferior classification accuracy. We propose a robust estimation procedure for functional logistic regression, in which the observations of the functional predictor are projected onto a set of finite-dimensional subspaces via robust functional principal component analysis. This dimension-reduction step reduces the outlying effects in the functional predictor. The logistic regression coefficient is estimated using an M-type estimator based on binary response and robust principal component scores. In doing so, we provide robust estimates by minimizing the effects of outliers in the binary response and functional predictor variables. Via a series of Monte-Carlo simulations and using hand radiograph data, we examine the parameter estimation and classification accuracy for the response variable. We find that the robust procedure outperforms some existing robust and non-robust methods when outliers are present, while producing competitive results when outliers are absent. In addition, the proposed method is computationally more efficient than some existing robust alternatives.\u0000</div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 1","pages":"121 - 145"},"PeriodicalIF":1.4,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00577-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139771456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural networks with functional inputs for multi-class supervised classification of replicated point patterns 用于复制点模式多类监督分类的功能输入神经网络

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-02-07 DOI: 10.1007/s11634-024-00579-5

Kateřina Pawlasová, Iva Karafiátová, Jiří Dvořák

{"title":"Neural networks with functional inputs for multi-class supervised classification of replicated point patterns","authors":"Kateřina Pawlasová, Iva Karafiátová, Jiří Dvořák","doi":"10.1007/s11634-024-00579-5","DOIUrl":"10.1007/s11634-024-00579-5","url":null,"abstract":"<div>A spatial point pattern is a collection of points observed in a bounded region of the Euclidean plane or space. With the dynamic development of modern imaging methods, large datasets of point patterns are available representing for example sub-cellular location patterns for human proteins or large forest populations. The main goal of this paper is to show the possibility of solving the supervised multi-class classification task for this particular type of complex data via functional neural networks. To predict the class membership for a newly observed point pattern, we compute an empirical estimate of a selected functional characteristic. Then, we consider such estimated function to be a functional variable entering the network. In a simulation study, we show that the neural network approach outperforms the kernel regression classifier that we consider a benchmark method in the point pattern setting. We also analyse a real dataset of point patterns of intramembranous particles and illustrate the practical applicability of the proposed method.</div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 3","pages":"705 - 721"},"PeriodicalIF":1.4,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-024-00579-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139771644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

k-means clustering for persistent homology 针对持久同源性的 k-means 聚类方法

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-01-31 DOI: 10.1007/s11634-023-00578-y

Yueqi Cao, Prudence Leung, Anthea Monod

引用次数: 0

RGA: a unified measure of predictive accuracy RGA：预测准确性的统一衡量标准

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2024-01-17 DOI: 10.1007/s11634-023-00574-2

Paolo Giudici, Emanuela Raffinetti

引用次数: 0