{"title":"使用单细胞RNA测序和机器学习的自动细胞识别","authors":"Chengqi Xu, Yuetian Chen, Yiyang Cao","doi":"10.1145/3512452.3512455","DOIUrl":null,"url":null,"abstract":"This paper investigates the superiority and limitations of different dimensionality reduction schemes and classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets. With systematic analysis as well as variables-controlled experiments, a pipeline was constructed from rpkm data to final cell type recognition and multiple dimension reduction methods are applied (including PCA, AutoEncoder, ISOMAP, and the combination algorithm of PCA+t-SNE) and multiple classifiers (Random Forest and Support Vector Machine, etc.) to obtain the accuracy difference of multiple solutions. By comparing the variation of different models and parameters on the final classification accuracy, this paper summarizes and outlook the information loss and classification effects of different processing schemes on the data set and seeks to find the best combination from them. Using the combination of PCA+SVM, this work obtained 53.13% global maximum accuracy and based on this result to further explore the possibility of improving accuracy and model transfer learning in a wider range of applications.","PeriodicalId":120446,"journal":{"name":"Proceedings of the 2021 5th International Conference on Computational Biology and Bioinformatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated Cell Recognition using Single-cell RNA Sequencing with Machine Learning\",\"authors\":\"Chengqi Xu, Yuetian Chen, Yiyang Cao\",\"doi\":\"10.1145/3512452.3512455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates the superiority and limitations of different dimensionality reduction schemes and classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets. With systematic analysis as well as variables-controlled experiments, a pipeline was constructed from rpkm data to final cell type recognition and multiple dimension reduction methods are applied (including PCA, AutoEncoder, ISOMAP, and the combination algorithm of PCA+t-SNE) and multiple classifiers (Random Forest and Support Vector Machine, etc.) to obtain the accuracy difference of multiple solutions. By comparing the variation of different models and parameters on the final classification accuracy, this paper summarizes and outlook the information loss and classification effects of different processing schemes on the data set and seeks to find the best combination from them. Using the combination of PCA+SVM, this work obtained 53.13% global maximum accuracy and based on this result to further explore the possibility of improving accuracy and model transfer learning in a wider range of applications.\",\"PeriodicalId\":120446,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Computational Biology and Bioinformatics\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Computational Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3512452.3512455\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Computational Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512452.3512455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated Cell Recognition using Single-cell RNA Sequencing with Machine Learning
This paper investigates the superiority and limitations of different dimensionality reduction schemes and classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets. With systematic analysis as well as variables-controlled experiments, a pipeline was constructed from rpkm data to final cell type recognition and multiple dimension reduction methods are applied (including PCA, AutoEncoder, ISOMAP, and the combination algorithm of PCA+t-SNE) and multiple classifiers (Random Forest and Support Vector Machine, etc.) to obtain the accuracy difference of multiple solutions. By comparing the variation of different models and parameters on the final classification accuracy, this paper summarizes and outlook the information loss and classification effects of different processing schemes on the data set and seeks to find the best combination from them. Using the combination of PCA+SVM, this work obtained 53.13% global maximum accuracy and based on this result to further explore the possibility of improving accuracy and model transfer learning in a wider range of applications.