Sparse vertex discriminant analysis: Variable selection for biomedical classification applications

IF 1.6 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis Pub Date : 2025-01-07 DOI:10.1016/j.csda.2025.108125

Alfonso Landeros , Seyoon Ko , Jack Z. Chang , Tong Tong Wu , Kenneth Lange

{"title":"Sparse vertex discriminant analysis: Variable selection for biomedical classification applications","authors":"Alfonso Landeros , Seyoon Ko , Jack Z. Chang , Tong Tong Wu , Kenneth Lange","doi":"10.1016/j.csda.2025.108125","DOIUrl":null,"url":null,"abstract":"<div><div>Modern biomedical datasets are often high-dimensional at multiple levels of biological organization. Practitioners must therefore grapple with data to estimate sparse or low-rank structures so as to adhere to the principle of parsimony. Further complicating matters is the presence of groups in data, each of which may have distinct associations with explanatory variables or be characterized by fundamentally different covariates. These themes in data analysis are explored in the context of classification. Vertex Discriminant Analysis (VDA) offers flexible linear and nonlinear models for classification that generalize the advantages of support vector machines to data with multiple classes. The proximal distance principle, which leverages projection and proximal operators in the design of practical algorithms, handily facilitates variable selection in VDA via nonconvex distance-to-set penalties directly controlling the number of active variables. Two flavors of sparse VDA are developed to address data in which instances may be homogeneous or heterogeneous with respect to predictors characterizing classes. Empirical studies illustrate how VDA is adapted to class-specific variable selection on simulated and real datasets, with an emphasis on applications to cancer classification via gene expression patterns.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"206 ","pages":"Article 108125"},"PeriodicalIF":1.6000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325000015","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Modern biomedical datasets are often high-dimensional at multiple levels of biological organization. Practitioners must therefore grapple with data to estimate sparse or low-rank structures so as to adhere to the principle of parsimony. Further complicating matters is the presence of groups in data, each of which may have distinct associations with explanatory variables or be characterized by fundamentally different covariates. These themes in data analysis are explored in the context of classification. Vertex Discriminant Analysis (VDA) offers flexible linear and nonlinear models for classification that generalize the advantages of support vector machines to data with multiple classes. The proximal distance principle, which leverages projection and proximal operators in the design of practical algorithms, handily facilitates variable selection in VDA via nonconvex distance-to-set penalties directly controlling the number of active variables. Two flavors of sparse VDA are developed to address data in which instances may be homogeneous or heterogeneous with respect to predictors characterizing classes. Empirical studies illustrate how VDA is adapted to class-specific variable selection on simulated and real datasets, with an emphasis on applications to cancer classification via gene expression patterns.

查看原文本刊更多论文

稀疏顶点判别分析：生物医学分类应用的变量选择

现代生物医学数据集在生物组织的多个层次上通常是高维的。因此，从业者必须与数据作斗争，以估计稀疏或低秩结构，从而坚持简约原则。进一步使问题复杂化的是数据中存在的组，每个组可能与解释变量有不同的关联，或者具有根本不同的协变量特征。数据分析中的这些主题是在分类的背景下探讨的。顶点判别分析（Vertex Discriminant Analysis， VDA）为分类提供了灵活的线性和非线性模型，将支持向量机的优势推广到具有多个类别的数据中。近距离原理在实际算法设计中利用投影算子和近距离算子，通过非凸距离集惩罚直接控制活动变量的数量，方便了VDA中的变量选择。开发了两种风格的稀疏VDA来处理数据，其中实例可能是同构的，也可能是异构的，这与描述类的预测器有关。实证研究说明了VDA如何适应模拟和真实数据集上的类别特异性变量选择，重点是通过基因表达模式应用于癌症分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Statistics & Data Analysis 数学-计算机：跨学科应用

CiteScore

3.70

自引率

5.60%

发文量

167

审稿时长

60 days

期刊介绍： Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]