Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R

Zhaoxing Wu, Chunming Zhang
{"title":"Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R","authors":"Zhaoxing Wu, Chunming Zhang","doi":"10.6339/23-jds1096","DOIUrl":null,"url":null,"abstract":"Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($\\mathrm{PDA}$) index, built upon the Linear Discriminant Analysis ($\\mathrm{LDA}$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($\\mathrm{SVM}$). This paper conducts extensive numerical studies to compare the performance of the $\\mathrm{PDA}$ index with the $\\mathrm{LDA}$ index and $\\mathrm{SVM}$, demonstrating that the $\\mathrm{PDA}$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $\\mathrm{PDA}$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $\\mathrm{PDA}$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data science : JDS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6339/23-jds1096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($\mathrm{PDA}$) index, built upon the Linear Discriminant Analysis ($\mathrm{LDA}$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($\mathrm{SVM}$). This paper conducts extensive numerical studies to compare the performance of the $\mathrm{PDA}$ index with the $\mathrm{LDA}$ index and $\mathrm{SVM}$, demonstrating that the $\mathrm{PDA}$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $\mathrm{PDA}$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $\mathrm{PDA}$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.
R中高维低样本量数据分类的投影寻踪指标评价
分析“大p小n”数据在广泛的应用领域中变得越来越重要。Lee和Cook(2010)在线性判别分析($\mathrm{LDA}$)指数的基础上,设计了惩罚判别分析($\mathrm{PDA}$)指数作为投影追踪指标,用于对高维数据进行分类,并取得了很好的结果。然而,与流行的支持向量机($\mathrm{SVM}$)相比,关于其性能的信息很少。本文进行了大量的数值研究,比较了$\ mathm {PDA}$索引与$\ mathm {LDA}$索引和$\ mathm {SVM}$索引的性能,表明$\ mathm {PDA}$索引对异常值具有鲁棒性,能够处理极小样本量、很少重要变量和多类的高维数据集。对几个真实世界数据集的分析揭示了单个方法的实际优势和局限性,表明$\ mathm {PDA}$索引为复杂的高维数据分类提供了一个有用的替代工具。这些新的见解,以及R包类spp中$\ mathm {PDA}$索引函数的实际实现,使统计学家和数据科学家能够有效地使用这两组分类工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信