Ana Carolina Lorena, Gustavo E. A. P. A. Batista, A. Carvalho, M. C. Monard
{"title":"The influence of noisy patterns on the performance of learning methods in the splice junction recognition problem","authors":"Ana Carolina Lorena, Gustavo E. A. P. A. Batista, A. Carvalho, M. C. Monard","doi":"10.1109/SBRN.2002.1181431","DOIUrl":null,"url":null,"abstract":"Since the beginning of the Human Genome Project, which aims at sequencing all the human's genetic information, a large amount of sequence data has been generated. Much attention is now given to the analysis of this data. A great part of these analysis is carried out with the use of intelligent computational techniques. However, many of the genetic databases are characterized by the presence of noisy data, which can deteriorate the performance of the computational techniques applied. This work studies the influence of noisy data in the training of three different learning methods: decision trees, artificial neural networks and support vector machines. The task investigated is the recognition of splice junctions in DNA sequences, which is part of the gene identification problem. Results indicate that the elimination of noisy patterns from the dataset can improve the learning algorithms' performance, with no significant reduction in their generalization ability.","PeriodicalId":157186,"journal":{"name":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBRN.2002.1181431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Since the beginning of the Human Genome Project, which aims at sequencing all the human's genetic information, a large amount of sequence data has been generated. Much attention is now given to the analysis of this data. A great part of these analysis is carried out with the use of intelligent computational techniques. However, many of the genetic databases are characterized by the presence of noisy data, which can deteriorate the performance of the computational techniques applied. This work studies the influence of noisy data in the training of three different learning methods: decision trees, artificial neural networks and support vector machines. The task investigated is the recognition of splice junctions in DNA sequences, which is part of the gene identification problem. Results indicate that the elimination of noisy patterns from the dataset can improve the learning algorithms' performance, with no significant reduction in their generalization ability.