C. LuisA.Santamaría, H. SarahíZuñiga, I. H. P. Torres, M. J. S. García, Mario Rossainz López
{"title":"基于图像表示的DNA序列识别","authors":"C. LuisA.Santamaría, H. SarahíZuñiga, I. H. P. Torres, M. J. S. García, Mario Rossainz López","doi":"10.13053/rcs-148-3-9","DOIUrl":null,"url":null,"abstract":"In recent years, the field of machine learning has progressed enormously in addressing difficult classification problems. The problem raised in this article is to recognize DNA sequences, recognize the boundaries between exons and introns using a graphic representation of DNA sequences and recent methods of deep learning. The objective of this work is to classify DNA sequences using a convolutional neuronal network (CNN). The set of DNA sequences used for the recognition were 1847 sequences from a database with 4 types of hepatitis C virus (type 1, 2, 3 and 6) taken from the repository available on the ViPR page. The other set of sequences used to recognize limits between exons and introns were sequences from the Molecular database (Splice-junction Gene Sequences) Data Set that has 3190 sequences, available on the ICU page, with three classes of sequences: limit exon-intron, limit intron-exon and none. For the processing of the DNA sequences, a representation method was designed where each nitrogenous base is represented in gray scale to form an image. The generated images were used to train the convolutional neuronal network. The results obtained from the CNN trained with the Hepatitis C virus database suggest that the CNNs are suitable for the classification of the images generated from the DNA sequences. This result led us to perform the experiments for the recognition of exons and introns with the UCI database for the recognition of limits between exons and introns. The results obtained were a training precision of 82%, a validation accuracy of 75% and an evaluation accuracy of 80.8%. It is concluded that it is possible to classify the images of DNA sequences of the databases used.","PeriodicalId":220522,"journal":{"name":"Res. Comput. Sci.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"DNA Sequence Recognition using Image Representation\",\"authors\":\"C. LuisA.Santamaría, H. SarahíZuñiga, I. H. P. Torres, M. J. S. García, Mario Rossainz López\",\"doi\":\"10.13053/rcs-148-3-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the field of machine learning has progressed enormously in addressing difficult classification problems. The problem raised in this article is to recognize DNA sequences, recognize the boundaries between exons and introns using a graphic representation of DNA sequences and recent methods of deep learning. The objective of this work is to classify DNA sequences using a convolutional neuronal network (CNN). The set of DNA sequences used for the recognition were 1847 sequences from a database with 4 types of hepatitis C virus (type 1, 2, 3 and 6) taken from the repository available on the ViPR page. The other set of sequences used to recognize limits between exons and introns were sequences from the Molecular database (Splice-junction Gene Sequences) Data Set that has 3190 sequences, available on the ICU page, with three classes of sequences: limit exon-intron, limit intron-exon and none. For the processing of the DNA sequences, a representation method was designed where each nitrogenous base is represented in gray scale to form an image. The generated images were used to train the convolutional neuronal network. The results obtained from the CNN trained with the Hepatitis C virus database suggest that the CNNs are suitable for the classification of the images generated from the DNA sequences. This result led us to perform the experiments for the recognition of exons and introns with the UCI database for the recognition of limits between exons and introns. The results obtained were a training precision of 82%, a validation accuracy of 75% and an evaluation accuracy of 80.8%. It is concluded that it is possible to classify the images of DNA sequences of the databases used.\",\"PeriodicalId\":220522,\"journal\":{\"name\":\"Res. Comput. Sci.\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Res. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.13053/rcs-148-3-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Res. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13053/rcs-148-3-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DNA Sequence Recognition using Image Representation
In recent years, the field of machine learning has progressed enormously in addressing difficult classification problems. The problem raised in this article is to recognize DNA sequences, recognize the boundaries between exons and introns using a graphic representation of DNA sequences and recent methods of deep learning. The objective of this work is to classify DNA sequences using a convolutional neuronal network (CNN). The set of DNA sequences used for the recognition were 1847 sequences from a database with 4 types of hepatitis C virus (type 1, 2, 3 and 6) taken from the repository available on the ViPR page. The other set of sequences used to recognize limits between exons and introns were sequences from the Molecular database (Splice-junction Gene Sequences) Data Set that has 3190 sequences, available on the ICU page, with three classes of sequences: limit exon-intron, limit intron-exon and none. For the processing of the DNA sequences, a representation method was designed where each nitrogenous base is represented in gray scale to form an image. The generated images were used to train the convolutional neuronal network. The results obtained from the CNN trained with the Hepatitis C virus database suggest that the CNNs are suitable for the classification of the images generated from the DNA sequences. This result led us to perform the experiments for the recognition of exons and introns with the UCI database for the recognition of limits between exons and introns. The results obtained were a training precision of 82%, a validation accuracy of 75% and an evaluation accuracy of 80.8%. It is concluded that it is possible to classify the images of DNA sequences of the databases used.