W. Ye, Ling Zhang, Wenqing Zhang, Xiaojiao Wu, Dong Yi, Yazhou Wu
{"title":"不同癌基因表达谱中缺失数据的单一插补和多重插补方法的比较","authors":"W. Ye, Ling Zhang, Wenqing Zhang, Xiaojiao Wu, Dong Yi, Yazhou Wu","doi":"10.1080/24709360.2021.2023805","DOIUrl":null,"url":null,"abstract":"To evaluate the effects of multiple-imputation (MI) method for missing data in gene expression profiles with different datasets and percentages of missing values compared with 3 single-imputation (SI) methods. Based on 3 gene expression profiles datasets from human colon cancer, non-small cell lung cancer, and lymph cancer, different deletion rates and different imputation numbers of MI were compared. The imputation and clustering effects of different methods were evaluated using the NRMSE and the gene clustering accuracy (F value). The NRMSE of the 4 methods gradually increased as the percentage of missing values in the 3 datasets increased, whereas the F value gradually decreased. In all datasets with different percentage of missing values settings, the NRMSEs of MI was consistently lower than those of the 3 SI methods, whereas the F value of MI was highest. The NRMSEs of MI gradually decreased as the number of imputations increased and increased as the variability in the original datasets increased, and the datasets imputed by MI showed the best clustering results. The results showed that the application of MI develops and enriches imputation-model approaches and provides a solid foundation for subsequent establishment of imputation strategies for gene expression profiles with missing data.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"6 1","pages":"113 - 127"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A comparison of single imputation and multiple imputation methods for missing data in different oncogene expression profiles\",\"authors\":\"W. Ye, Ling Zhang, Wenqing Zhang, Xiaojiao Wu, Dong Yi, Yazhou Wu\",\"doi\":\"10.1080/24709360.2021.2023805\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To evaluate the effects of multiple-imputation (MI) method for missing data in gene expression profiles with different datasets and percentages of missing values compared with 3 single-imputation (SI) methods. Based on 3 gene expression profiles datasets from human colon cancer, non-small cell lung cancer, and lymph cancer, different deletion rates and different imputation numbers of MI were compared. The imputation and clustering effects of different methods were evaluated using the NRMSE and the gene clustering accuracy (F value). The NRMSE of the 4 methods gradually increased as the percentage of missing values in the 3 datasets increased, whereas the F value gradually decreased. In all datasets with different percentage of missing values settings, the NRMSEs of MI was consistently lower than those of the 3 SI methods, whereas the F value of MI was highest. The NRMSEs of MI gradually decreased as the number of imputations increased and increased as the variability in the original datasets increased, and the datasets imputed by MI showed the best clustering results. The results showed that the application of MI develops and enriches imputation-model approaches and provides a solid foundation for subsequent establishment of imputation strategies for gene expression profiles with missing data.\",\"PeriodicalId\":37240,\"journal\":{\"name\":\"Biostatistics and Epidemiology\",\"volume\":\"6 1\",\"pages\":\"113 - 127\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biostatistics and Epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/24709360.2021.2023805\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics and Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24709360.2021.2023805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
A comparison of single imputation and multiple imputation methods for missing data in different oncogene expression profiles
To evaluate the effects of multiple-imputation (MI) method for missing data in gene expression profiles with different datasets and percentages of missing values compared with 3 single-imputation (SI) methods. Based on 3 gene expression profiles datasets from human colon cancer, non-small cell lung cancer, and lymph cancer, different deletion rates and different imputation numbers of MI were compared. The imputation and clustering effects of different methods were evaluated using the NRMSE and the gene clustering accuracy (F value). The NRMSE of the 4 methods gradually increased as the percentage of missing values in the 3 datasets increased, whereas the F value gradually decreased. In all datasets with different percentage of missing values settings, the NRMSEs of MI was consistently lower than those of the 3 SI methods, whereas the F value of MI was highest. The NRMSEs of MI gradually decreased as the number of imputations increased and increased as the variability in the original datasets increased, and the datasets imputed by MI showed the best clustering results. The results showed that the application of MI develops and enriches imputation-model approaches and provides a solid foundation for subsequent establishment of imputation strategies for gene expression profiles with missing data.