Gustavo Henrique Ferreira Cruz, Vinícius Menossi, Josiane Melchiori Pinheiro, Antônio Roberto dos Santos, Gustavo Luiz Furuhata Ferreira, Sarah Anduca de Oliveira
{"title":"丝状真菌DNA序列编码区域的机器学习识别","authors":"Gustavo Henrique Ferreira Cruz, Vinícius Menossi, Josiane Melchiori Pinheiro, Antônio Roberto dos Santos, Gustavo Luiz Furuhata Ferreira, Sarah Anduca de Oliveira","doi":"10.14210/cotb.v13.p236-242","DOIUrl":null,"url":null,"abstract":"The task of identifying intron and exon regions in genes is a verycomplex task, and it is necessary to identify certain nucleotidepatterns in the gene sequence. This task can be done manually orthrough software that most often uses genetic alignment techniques, which is not a very effective way for this purpose. In this oppor-tunity for collaboration between biology and computer science using machine learning techniques, the objective was to predictthe intron and exon regions in filamentous fungi genes as well totranslate the identified regions intro proteic codons. In this paper,the problem was modeled as a supervised learning problem, basedon training a set of genes obtained from GenBank that alreadyhave the intron and exon regions identified. The machine learningmodel used in this work was the Condicional Random Fields (CRF).Through the values resulting from the metrics applied to the model,it can be seen that it is possible to achieve a good precision in thetask of identifying the intron and exon regions as well the proteiccodons. Thus, although there is a need for a greater diversity ofdatabase characteristics to support the effectiveness of identifyingthe splicing sites, this paper gives evidence that it is possible topredict these splicing sites with a good accuracy.","PeriodicalId":375380,"journal":{"name":"Anais do XIII Computer on the Beach - COTB'22","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aprendizagem de Máquina na identificação de regiões codantes em sequências de DNA de fungos filamentosos\",\"authors\":\"Gustavo Henrique Ferreira Cruz, Vinícius Menossi, Josiane Melchiori Pinheiro, Antônio Roberto dos Santos, Gustavo Luiz Furuhata Ferreira, Sarah Anduca de Oliveira\",\"doi\":\"10.14210/cotb.v13.p236-242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of identifying intron and exon regions in genes is a verycomplex task, and it is necessary to identify certain nucleotidepatterns in the gene sequence. This task can be done manually orthrough software that most often uses genetic alignment techniques, which is not a very effective way for this purpose. In this oppor-tunity for collaboration between biology and computer science using machine learning techniques, the objective was to predictthe intron and exon regions in filamentous fungi genes as well totranslate the identified regions intro proteic codons. In this paper,the problem was modeled as a supervised learning problem, basedon training a set of genes obtained from GenBank that alreadyhave the intron and exon regions identified. The machine learningmodel used in this work was the Condicional Random Fields (CRF).Through the values resulting from the metrics applied to the model,it can be seen that it is possible to achieve a good precision in thetask of identifying the intron and exon regions as well the proteiccodons. Thus, although there is a need for a greater diversity ofdatabase characteristics to support the effectiveness of identifyingthe splicing sites, this paper gives evidence that it is possible topredict these splicing sites with a good accuracy.\",\"PeriodicalId\":375380,\"journal\":{\"name\":\"Anais do XIII Computer on the Beach - COTB'22\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do XIII Computer on the Beach - COTB'22\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14210/cotb.v13.p236-242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XIII Computer on the Beach - COTB'22","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14210/cotb.v13.p236-242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Aprendizagem de Máquina na identificação de regiões codantes em sequências de DNA de fungos filamentosos
The task of identifying intron and exon regions in genes is a verycomplex task, and it is necessary to identify certain nucleotidepatterns in the gene sequence. This task can be done manually orthrough software that most often uses genetic alignment techniques, which is not a very effective way for this purpose. In this oppor-tunity for collaboration between biology and computer science using machine learning techniques, the objective was to predictthe intron and exon regions in filamentous fungi genes as well totranslate the identified regions intro proteic codons. In this paper,the problem was modeled as a supervised learning problem, basedon training a set of genes obtained from GenBank that alreadyhave the intron and exon regions identified. The machine learningmodel used in this work was the Condicional Random Fields (CRF).Through the values resulting from the metrics applied to the model,it can be seen that it is possible to achieve a good precision in thetask of identifying the intron and exon regions as well the proteiccodons. Thus, although there is a need for a greater diversity ofdatabase characteristics to support the effectiveness of identifyingthe splicing sites, this paper gives evidence that it is possible topredict these splicing sites with a good accuracy.