{"title":"基于支持向量机的非平衡DNA序列启动子识别二元特征映射规则分析","authors":"Robertas Damaševičius","doi":"10.1109/IS.2008.4670503","DOIUrl":null,"url":null,"abstract":"Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. In this paper, a machine learning method, called support vector machine (SVM), is used for classification of DNA sequences and promoter recognition. For optimal classification, 11 rules for mapping of DNA sequences into binary SVM feature space are analyzed. Classification is performed using a power series kernel function. Kernel parameters are optimized using a modification of the Nelder-Mead (downhill simplex) optimization method. The results of classification for drosophila and human sequence datasets are presented.","PeriodicalId":305750,"journal":{"name":"2008 4th International IEEE Conference Intelligent Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using Support Vector Machine\",\"authors\":\"Robertas Damaševičius\",\"doi\":\"10.1109/IS.2008.4670503\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. In this paper, a machine learning method, called support vector machine (SVM), is used for classification of DNA sequences and promoter recognition. For optimal classification, 11 rules for mapping of DNA sequences into binary SVM feature space are analyzed. Classification is performed using a power series kernel function. Kernel parameters are optimized using a modification of the Nelder-Mead (downhill simplex) optimization method. The results of classification for drosophila and human sequence datasets are presented.\",\"PeriodicalId\":305750,\"journal\":{\"name\":\"2008 4th International IEEE Conference Intelligent Systems\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 4th International IEEE Conference Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IS.2008.4670503\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 4th International IEEE Conference Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS.2008.4670503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using Support Vector Machine
Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. In this paper, a machine learning method, called support vector machine (SVM), is used for classification of DNA sequences and promoter recognition. For optimal classification, 11 rules for mapping of DNA sequences into binary SVM feature space are analyzed. Classification is performed using a power series kernel function. Kernel parameters are optimized using a modification of the Nelder-Mead (downhill simplex) optimization method. The results of classification for drosophila and human sequence datasets are presented.