{"title":"Automated Cell Recognition using Single-cell RNA Sequencing with Machine Learning","authors":"Chengqi Xu, Yuetian Chen, Yiyang Cao","doi":"10.1145/3512452.3512455","DOIUrl":null,"url":null,"abstract":"This paper investigates the superiority and limitations of different dimensionality reduction schemes and classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets. With systematic analysis as well as variables-controlled experiments, a pipeline was constructed from rpkm data to final cell type recognition and multiple dimension reduction methods are applied (including PCA, AutoEncoder, ISOMAP, and the combination algorithm of PCA+t-SNE) and multiple classifiers (Random Forest and Support Vector Machine, etc.) to obtain the accuracy difference of multiple solutions. By comparing the variation of different models and parameters on the final classification accuracy, this paper summarizes and outlook the information loss and classification effects of different processing schemes on the data set and seeks to find the best combination from them. Using the combination of PCA+SVM, this work obtained 53.13% global maximum accuracy and based on this result to further explore the possibility of improving accuracy and model transfer learning in a wider range of applications.","PeriodicalId":120446,"journal":{"name":"Proceedings of the 2021 5th International Conference on Computational Biology and Bioinformatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Computational Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512452.3512455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper investigates the superiority and limitations of different dimensionality reduction schemes and classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets. With systematic analysis as well as variables-controlled experiments, a pipeline was constructed from rpkm data to final cell type recognition and multiple dimension reduction methods are applied (including PCA, AutoEncoder, ISOMAP, and the combination algorithm of PCA+t-SNE) and multiple classifiers (Random Forest and Support Vector Machine, etc.) to obtain the accuracy difference of multiple solutions. By comparing the variation of different models and parameters on the final classification accuracy, this paper summarizes and outlook the information loss and classification effects of different processing schemes on the data set and seeks to find the best combination from them. Using the combination of PCA+SVM, this work obtained 53.13% global maximum accuracy and based on this result to further explore the possibility of improving accuracy and model transfer learning in a wider range of applications.