Advances in Data Analysis and Classification最新文献

筛选
英文 中文
Flexible mixture regression with the generalized hyperbolic distribution 使用广义双曲分布的灵活混合回归
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-01-04 DOI: 10.1007/s11634-022-00532-4
Nam-Hwui Kim, Ryan P. Browne
{"title":"Flexible mixture regression with the generalized hyperbolic distribution","authors":"Nam-Hwui Kim,&nbsp;Ryan P. Browne","doi":"10.1007/s11634-022-00532-4","DOIUrl":"10.1007/s11634-022-00532-4","url":null,"abstract":"<div><p>When modeling the functional relationship between a response variable and covariates via linear regression, multiple relationships may be present depending on the underlying component structure. Deploying a flexible mixture distribution can help with capturing a wide variety of such structures, thereby successfully modeling the response–covariate relationship while addressing the components. In that spirit, a mixture regression model based on the finite mixture of generalized hyperbolic distributions is introduced, and its parameter estimation method is presented. The flexibility of the generalized hyperbolic distribution can identify better-fitting components, which can lead to a more meaningful functional relationship between the response variable and the covariates. In addition, we introduce an iterative component combining procedure to aid the interpretability of the model. The results from simulated and real data analyses indicate that our method offers a distinctive edge over some of the existing methods, and that it can generate useful insights on the data set at hand for further investigation.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 1","pages":"33 - 60"},"PeriodicalIF":1.4,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82422675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse correspondence analysis for large contingency tables 大型列联表的稀疏对应分析
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-01-02 DOI: 10.1007/s11634-022-00531-5
Ruiping Liu, Ndeye Niang, Gilbert Saporta, Huiwen Wang
{"title":"Sparse correspondence analysis for large contingency tables","authors":"Ruiping Liu,&nbsp;Ndeye Niang,&nbsp;Gilbert Saporta,&nbsp;Huiwen Wang","doi":"10.1007/s11634-022-00531-5","DOIUrl":"10.1007/s11634-022-00531-5","url":null,"abstract":"<div><p>We propose sparse variants of correspondence analysis (CA) for large contingency tables like documents-terms matrices used in text mining. By seeking to obtain many zero coefficients, sparse CA remedies to the difficulty of interpreting CA results when the size of the table is large. Since CA is a double weighted PCA (for rows and columns) or a weighted generalized SVD, we adapt known sparse versions of these methods with specific developments to obtain orthogonal solutions and to tune the sparseness parameters. We distinguish two cases depending on whether sparseness is asked for both rows and columns, or only for one set.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"1037 - 1056"},"PeriodicalIF":1.6,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50003542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Proximal methods for sparse optimal scoring and discriminant analysis 稀疏最优评分和判别分析的近似方法
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-12-21 DOI: 10.1007/s11634-022-00530-6
Summer Atkins, Gudmundur Einarsson, Line Clemmensen, Brendan Ames
{"title":"Proximal methods for sparse optimal scoring and discriminant analysis","authors":"Summer Atkins,&nbsp;Gudmundur Einarsson,&nbsp;Line Clemmensen,&nbsp;Brendan Ames","doi":"10.1007/s11634-022-00530-6","DOIUrl":"10.1007/s11634-022-00530-6","url":null,"abstract":"<div><p>Linear discriminant analysis (LDA) is a classical method for dimensionality reduction, where discriminant vectors are sought to project data to a lower dimensional space for optimal separability of classes. Several recent papers have outlined strategies, based on exploiting sparsity of the discriminant vectors, for performing LDA in the high-dimensional setting where the number of features exceeds the number of observations in the data. However, many of these proposed methods lack scalable methods for solution of the underlying optimization problems. We consider an optimization scheme for solving the sparse optimal scoring formulation of LDA based on block coordinate descent. Each iteration of this algorithm requires an update of a scoring vector, which admits an analytic formula, and an update of the corresponding discriminant vector, which requires solution of a convex subproblem; we will propose several variants of this algorithm where the proximal gradient method or the alternating direction method of multipliers is used to solve this subproblem. We show that the per-iteration cost of these methods scales linearly in the dimension of the data provided restricted regularization terms are employed, and cubically in the dimension of the data in the worst case. Furthermore, we establish that when this block coordinate descent framework generates convergent subsequences of iterates, then these subsequences converge to the stationary points of the sparse optimal scoring problem. We demonstrate the effectiveness of our new methods with empirical results for classification of Gaussian data and data sets drawn from benchmarking repositories, including time-series and multispectral X-ray data, and provide <span>Matlab</span> and <span>R</span> implementations of our optimization schemes.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"983 - 1036"},"PeriodicalIF":1.6,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50502301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
LASSO regularization within the LocalGLMnet architecture LocalGLMnet体系结构中的LASSO正则化
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-12-13 DOI: 10.1007/s11634-022-00529-z
Ronald Richman, Mario V. Wüthrich
{"title":"LASSO regularization within the LocalGLMnet architecture","authors":"Ronald Richman,&nbsp;Mario V. Wüthrich","doi":"10.1007/s11634-022-00529-z","DOIUrl":"10.1007/s11634-022-00529-z","url":null,"abstract":"<div><p>Deep learning models have been very successful in the application of machine learning methods, often out-performing classical statistical models such as linear regression models or generalized linear models. On the other hand, deep learning models are often criticized for not being explainable nor allowing for variable selection. There are two different ways of dealing with this problem, either we use post-hoc model interpretability methods or we design specific deep learning architectures that allow for an easier interpretation and explanation. This paper builds on our previous work on the LocalGLMnet architecture that gives an interpretable deep learning architecture. In the present paper, we show how group LASSO regularization (and other regularization schemes) can be implemented within the LocalGLMnet architecture so that we receive feature sparsity for variable selection. We benchmark our approach with the recently developed LassoNet of Lemhadri et al. ( LassoNet: a neural network with feature sparsity. J Mach Learn Res 22:1–29, 2021).</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"951 - 981"},"PeriodicalIF":1.6,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50047295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A power-controlled reliability assessment for multi-class probabilistic classifiers 多类概率分类器的功率控制可靠性评估
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-11-17 DOI: 10.1007/s11634-022-00528-0
Hyukjun Gweon
{"title":"A power-controlled reliability assessment for multi-class probabilistic classifiers","authors":"Hyukjun Gweon","doi":"10.1007/s11634-022-00528-0","DOIUrl":"10.1007/s11634-022-00528-0","url":null,"abstract":"<div><p>In multi-class classification, the output of a probabilistic classifier is a probability distribution of the classes. In this work, we focus on a statistical assessment of the reliability of probabilistic classifiers for multi-class problems. Our approach generates a Pearson <span>(chi ^2)</span> statistic based on the <i>k</i>-nearest-neighbors in the prediction space. Further, we develop a Bayesian approach for estimating the expected power of the reliability test that can be used for an appropriate sample size <i>k</i>. We propose a sampling algorithm and demonstrate that this algorithm obtains a valid prior distribution. The effectiveness of the proposed reliability test and expected power is evaluated through a simulation study. We also provide illustrative examples of the proposed methods with practical applications.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"927 - 949"},"PeriodicalIF":1.6,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50071056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A dual subspace parsimonious mixture of matrix normal distributions 矩阵正态分布的对偶子空间简约混合
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-11-16 DOI: 10.1007/s11634-022-00526-2
Alex Sharp, Glen Chalatov, Ryan P. Browne
{"title":"A dual subspace parsimonious mixture of matrix normal distributions","authors":"Alex Sharp,&nbsp;Glen Chalatov,&nbsp;Ryan P. Browne","doi":"10.1007/s11634-022-00526-2","DOIUrl":"10.1007/s11634-022-00526-2","url":null,"abstract":"<div><p>We present a parsimonious dual-subspace clustering approach for a mixture of matrix-normal distributions. By assuming certain principal components of the row and column covariance matrices are equally important, we express the model in fewer parameters without sacrificing discriminatory information. We derive update rules for an ECM algorithm and set forth necessary conditions to ensure identifiability. We use simulation to demonstrate parameter recovery, and we illustrate the parsimony and competitive performance of the model through two data analyses.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"801 - 822"},"PeriodicalIF":1.6,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50032840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Monitoring photochemical pollutants based on symbolic interval-valued data analysis 基于符号区间值数据分析的光化学污染物监测
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-11-12 DOI: 10.1007/s11634-022-00527-1
Liang-Ching Lin, Meihui Guo, Sangyeol Lee
{"title":"Monitoring photochemical pollutants based on symbolic interval-valued data analysis","authors":"Liang-Ching Lin,&nbsp;Meihui Guo,&nbsp;Sangyeol Lee","doi":"10.1007/s11634-022-00527-1","DOIUrl":"10.1007/s11634-022-00527-1","url":null,"abstract":"<div><p>This study considers monitoring photochemical pollutants for anomaly detection based on symbolic interval-valued data analysis. For this task, we construct control charts based on the principal component scores of symbolic interval-valued data. Herein, the symbolic interval-valued data are assumed to follow a normal distribution, and an approximate expectation formula of order statistics from the normal distribution is used in the univariate case to estimate the mean and variance via the method of moments. In addition, we consider the bivariate case wherein we use the maximum likelihood estimator calculated from the likelihood function derived under a bivariate copula. We also establish the procedures for the statistical control chart based on the univariate and bivariate interval-valued variables, and the procedures are potentially extendable to higher dimensional cases. Monte Carlo simulations and real data analysis using photochemical pollutants confirm the validity of the proposed method. The results particularly show the superiority over the conventional method that uses the averages to identify the date on which the abnormal maximum occurred.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"897 - 926"},"PeriodicalIF":1.6,"publicationDate":"2022-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50045936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial for ADAC issue 4 of volume 16 (2022) ADAC第16卷第4期社论(2022)
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-10-31 DOI: 10.1007/s11634-022-00525-3
Maurizio Vichi, Andrea Ceroli, Hans A. Kestler, Akinori Okada, Claus Weihs
{"title":"Editorial for ADAC issue 4 of volume 16 (2022)","authors":"Maurizio Vichi,&nbsp;Andrea Ceroli,&nbsp;Hans A. Kestler,&nbsp;Akinori Okada,&nbsp;Claus Weihs","doi":"10.1007/s11634-022-00525-3","DOIUrl":"10.1007/s11634-022-00525-3","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"16 4","pages":"817 - 821"},"PeriodicalIF":1.6,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50529237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attraction-repulsion clustering: a way of promoting diversity linked to demographic parity in fair clustering 吸引-排斥聚类:一种在公平聚类中促进与人口均等相关的多样性的方法
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-10-20 DOI: 10.1007/s11634-022-00516-4
Eustasio del Barrio, Hristo Inouzhe, Jean-Michel Loubes
{"title":"Attraction-repulsion clustering: a way of promoting diversity linked to demographic parity in fair clustering","authors":"Eustasio del Barrio,&nbsp;Hristo Inouzhe,&nbsp;Jean-Michel Loubes","doi":"10.1007/s11634-022-00516-4","DOIUrl":"10.1007/s11634-022-00516-4","url":null,"abstract":"<div><p>We consider the problem of <i>diversity enhancing clustering</i>, i.e, developing clustering methods which produce clusters that favour diversity with respect to a set of protected attributes such as race, sex, age, etc. In the context of <i>fair clustering</i>, diversity plays a major role when fairness is understood as demographic parity. To promote diversity, we introduce perturbations to the distance in the unprotected attributes that account for protected attributes in a way that resembles attraction-repulsion of charged particles in Physics. These perturbations are defined through dissimilarities with a tractable interpretation. Cluster analysis based on attraction-repulsion dissimilarities penalizes homogeneity of the clusters with respect to the protected attributes and leads to an improvement in diversity. An advantage of our approach, which falls into a pre-processing set-up, is its compatibility with a wide variety of clustering methods and whit non-Euclidean data. We illustrate the use of our procedures with both synthetic and real data and provide discussion about the relation between diversity, fairness, and cluster structure.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"859 - 896"},"PeriodicalIF":1.6,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00516-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50040006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A structured covariance ensemble for sufficient dimension reduction 一种用于充分降维的结构化协方差系综
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2022-10-19 DOI: 10.1007/s11634-022-00524-4
Qin Wang, Yuan Xue
{"title":"A structured covariance ensemble for sufficient dimension reduction","authors":"Qin Wang,&nbsp;Yuan Xue","doi":"10.1007/s11634-022-00524-4","DOIUrl":"10.1007/s11634-022-00524-4","url":null,"abstract":"<div><p>Sufficient dimension reduction (SDR) is a useful tool for high-dimensional data analysis. SDR aims at reducing the data dimensionality without loss of regression information between the response and its high-dimensional predictors. Many existing SDR methods are designed for the data with continuous responses. Motivated by a recent work on aggregate dimension reduction (Wang in Stat Si 30:1027–1048, 2020), we propose a unified SDR framework for both continuous and binary responses through a structured covariance ensemble. The connection with existing approaches is discussed in details and an efficient algorithm is proposed. Numerical examples and a real data application demonstrate its satisfactory performance.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"777 - 800"},"PeriodicalIF":1.6,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50497854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信