{"title":"Sufficient Dimension Reduction with Deep Neural Networks for Phenotype Prediction","authors":"Siqi Liang, Wei-Heng Huang, F. Liang","doi":"10.11159/icsta21.134","DOIUrl":null,"url":null,"abstract":"Phenotype prediction with genome-wide SNPs or biomarkers is a difficult problem in biomedical research due to many issues, such as nonlinearity of the underlying genetic mapping, high-dimensionality of SNP data, and insufficiency of training samples. To tackle this difficulty, we propose a split-and-merge deep neural network (SM-DNN) method, which employs the split-and-merge technique on deep neural networks to obtain nonlinear sufficient dimension reduction of the input data and then learn a deep neural network on the dimension reduced data. We show that the DNN-based dimension reduction is sufficient, which retains all information on response contained in the explanatory data. Our numerical experiments indicate that the SM-DNN method can lead to significant improvement in phenotype prediction for a variety of real data examples. In particular, with only rare variants, we achieved a remarkable prediction accuracy of over 74% for the Early-Onset Myocardial Infarction (EOMI) exome sequence data.","PeriodicalId":403959,"journal":{"name":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Statistics: Theory and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11159/icsta21.134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Phenotype prediction with genome-wide SNPs or biomarkers is a difficult problem in biomedical research due to many issues, such as nonlinearity of the underlying genetic mapping, high-dimensionality of SNP data, and insufficiency of training samples. To tackle this difficulty, we propose a split-and-merge deep neural network (SM-DNN) method, which employs the split-and-merge technique on deep neural networks to obtain nonlinear sufficient dimension reduction of the input data and then learn a deep neural network on the dimension reduced data. We show that the DNN-based dimension reduction is sufficient, which retains all information on response contained in the explanatory data. Our numerical experiments indicate that the SM-DNN method can lead to significant improvement in phenotype prediction for a variety of real data examples. In particular, with only rare variants, we achieved a remarkable prediction accuracy of over 74% for the Early-Onset Myocardial Infarction (EOMI) exome sequence data.