{"title":"基于随机投影的随机森林缺失基因表达数据的估算","authors":"Lovedeep Gondara","doi":"10.1109/ICMLA.2015.29","DOIUrl":null,"url":null,"abstract":"Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Random Forest with Random Projection to Impute Missing Gene Expression Data\",\"authors\":\"Lovedeep Gondara\",\"doi\":\"10.1109/ICMLA.2015.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.\",\"PeriodicalId\":288427,\"journal\":{\"name\":\"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"165 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2015.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Random Forest with Random Projection to Impute Missing Gene Expression Data
Measurement error or lack of proper experimental setup often results in invalid or missing data in gene expression studies. Small sample size and cost of re-running the experiment presents a need for an efficient missing data imputation technique. In this paper, we propose a method based on Random forest using Random projection as a data pre-processing filter. Initial results using varying missing data proportions on variety of real datasets show that the imputation process based on Random forest performs equally well or better than K-Nearest Neighbor & Support Vector Regression based methods. Using Random projection we show that dimensionality of a dataset can be reduced by 50 percent without affecting the imputation process.