D. Fotiadis, T. Exarchos, M. Tsipouras, C. Papaloukas
{"title":"Biosequence Classification using Sequential Pattern Mining and Optimization","authors":"D. Fotiadis, T. Exarchos, M. Tsipouras, C. Papaloukas","doi":"10.1109/ITAB.2007.4407423","DOIUrl":null,"url":null,"abstract":"In this paper we present a methodology for biosequence classification, which employs sequential pattern mining and optimization algorithms. In the first stage, a sequential pattern mining algorithm is applied to a set of biological sequences and the sequential patterns are extracted. Then, the score of each pattern with respect to each sequence is calculated using a scoring function and the score of each class under consideration is estimated. The scores of the patterns and classes are updated, multiplied by a weight. In the second stage an optimization technique is employed to calculate the weight values to achieve the optimal classification accuracy. The methodology is applied in the protein class and fold prediction problem. Extensive evaluation is carried out, using a dataset obtained from the Protein Data Bank.","PeriodicalId":129874,"journal":{"name":"2007 6th International Special Topic Conference on Information Technology Applications in Biomedicine","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 6th International Special Topic Conference on Information Technology Applications in Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITAB.2007.4407423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper we present a methodology for biosequence classification, which employs sequential pattern mining and optimization algorithms. In the first stage, a sequential pattern mining algorithm is applied to a set of biological sequences and the sequential patterns are extracted. Then, the score of each pattern with respect to each sequence is calculated using a scoring function and the score of each class under consideration is estimated. The scores of the patterns and classes are updated, multiplied by a weight. In the second stage an optimization technique is employed to calculate the weight values to achieve the optimal classification accuracy. The methodology is applied in the protein class and fold prediction problem. Extensive evaluation is carried out, using a dataset obtained from the Protein Data Bank.