{"title":"结构化协变量空间中的稀疏线性判别分析","authors":"S. Safo, Q. Long","doi":"10.1002/sam.11376","DOIUrl":null,"url":null,"abstract":"Classification with high dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high dimensional data is as bad as random guessing due to the many noise features that increases misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this article and propose methods that incorporate variable selection into the classification problem, for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods to existing sparse LDA approaches via simulation studies and real data analysis.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Sparse Linear Discriminant Analysis in Structured Covariates Space\",\"authors\":\"S. Safo, Q. Long\",\"doi\":\"10.1002/sam.11376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification with high dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high dimensional data is as bad as random guessing due to the many noise features that increases misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this article and propose methods that incorporate variable selection into the classification problem, for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods to existing sparse LDA approaches via simulation studies and real data analysis.\",\"PeriodicalId\":193885,\"journal\":{\"name\":\"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11376\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sparse Linear Discriminant Analysis in Structured Covariates Space
Classification with high dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high dimensional data is as bad as random guessing due to the many noise features that increases misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this article and propose methods that incorporate variable selection into the classification problem, for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods to existing sparse LDA approaches via simulation studies and real data analysis.