{"title":"The Classification of Gene Sequencer Based on Machine Learning","authors":"Jie Yang, Yong Cao","doi":"10.1145/3511716.3511730","DOIUrl":null,"url":null,"abstract":"Abstract: Biological sequencing plays a very important role in life science, especially with the improvement of sequencing technology and the development of sequencing instruments, and a large number of biological sequencing quality data are produced every day. Because of different sequencers, the quality of sequencing is different. In the process of sequencing quality control, the model of sequencer can be deduced according to the quality of gene sequence. Therefore, in this paper, five sequencers of Illumina HiSeq series, Illumina HiSeq 2000, Illumina HiSeq 2500, Illumina HiSeq 3000, Illumina HiSeq 4000 and Illumina HiSeq XTen, are selected as the classification objects. Firstly, the sequencing quality data of the five sequencers are preprocessed. Then, the classification model is trained by three machine learning algorithms: decision tree, logistic regression and support vector machine. The experimental results show that the accuracy rates of the three machine learning algorithms are 96.67%, 97.50% and 97.50% respectively. These algorithms are very good to solve the problem of using biological sequencing data quality to classify sequencer.","PeriodicalId":105018,"journal":{"name":"Proceedings of the 2021 4th International Conference on E-Business, Information Management and Computer Science","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 4th International Conference on E-Business, Information Management and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511716.3511730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract: Biological sequencing plays a very important role in life science, especially with the improvement of sequencing technology and the development of sequencing instruments, and a large number of biological sequencing quality data are produced every day. Because of different sequencers, the quality of sequencing is different. In the process of sequencing quality control, the model of sequencer can be deduced according to the quality of gene sequence. Therefore, in this paper, five sequencers of Illumina HiSeq series, Illumina HiSeq 2000, Illumina HiSeq 2500, Illumina HiSeq 3000, Illumina HiSeq 4000 and Illumina HiSeq XTen, are selected as the classification objects. Firstly, the sequencing quality data of the five sequencers are preprocessed. Then, the classification model is trained by three machine learning algorithms: decision tree, logistic regression and support vector machine. The experimental results show that the accuracy rates of the three machine learning algorithms are 96.67%, 97.50% and 97.50% respectively. These algorithms are very good to solve the problem of using biological sequencing data quality to classify sequencer.