{"title":"R中高维低样本量数据分类的投影寻踪指标评价","authors":"Zhaoxing Wu, Chunming Zhang","doi":"10.6339/23-jds1096","DOIUrl":null,"url":null,"abstract":"Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($\\mathrm{PDA}$) index, built upon the Linear Discriminant Analysis ($\\mathrm{LDA}$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($\\mathrm{SVM}$). This paper conducts extensive numerical studies to compare the performance of the $\\mathrm{PDA}$ index with the $\\mathrm{LDA}$ index and $\\mathrm{SVM}$, demonstrating that the $\\mathrm{PDA}$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $\\mathrm{PDA}$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $\\mathrm{PDA}$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R\",\"authors\":\"Zhaoxing Wu, Chunming Zhang\",\"doi\":\"10.6339/23-jds1096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($\\\\mathrm{PDA}$) index, built upon the Linear Discriminant Analysis ($\\\\mathrm{LDA}$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($\\\\mathrm{SVM}$). This paper conducts extensive numerical studies to compare the performance of the $\\\\mathrm{PDA}$ index with the $\\\\mathrm{LDA}$ index and $\\\\mathrm{SVM}$, demonstrating that the $\\\\mathrm{PDA}$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $\\\\mathrm{PDA}$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $\\\\mathrm{PDA}$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.\",\"PeriodicalId\":73699,\"journal\":{\"name\":\"Journal of data science : JDS\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of data science : JDS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6339/23-jds1096\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data science : JDS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6339/23-jds1096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R
Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($\mathrm{PDA}$) index, built upon the Linear Discriminant Analysis ($\mathrm{LDA}$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($\mathrm{SVM}$). This paper conducts extensive numerical studies to compare the performance of the $\mathrm{PDA}$ index with the $\mathrm{LDA}$ index and $\mathrm{SVM}$, demonstrating that the $\mathrm{PDA}$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $\mathrm{PDA}$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $\mathrm{PDA}$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.