{"title":"Automated learning of genome sequences by computational intelligence","authors":"M.Q. Yang, J.Y. Yang, Zuojie Luo, O. Ersoy","doi":"10.1109/CIMA.2005.1662321","DOIUrl":null,"url":null,"abstract":"Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins lack specific 3D structures and are unconsented during evolution, but play central roles in diseases characterized by protein misfolding and aggregation. Can we assign protein functions to sequences without relying on 3D structures, to provide useful information for the study of diseases? We developed machine learning techniques to rapidly assess protein functions from sequences. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions (due to complex interactions among proteins). It follows that the instances in the resulting classification problem can carry multiple class labels. We have developed a tree-based classifier that capable of classifying multiply-labeled data and gained an insight into the multi-functional nature of proteins. The algorithm has been used with ensemble methods in connection with other computational intelligence to form a committee machine. Results have been compared favorably to those achieved algorithms such as decision trees and support vector machines","PeriodicalId":306045,"journal":{"name":"2005 ICSC Congress on Computational Intelligence Methods and Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 ICSC Congress on Computational Intelligence Methods and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIMA.2005.1662321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins lack specific 3D structures and are unconsented during evolution, but play central roles in diseases characterized by protein misfolding and aggregation. Can we assign protein functions to sequences without relying on 3D structures, to provide useful information for the study of diseases? We developed machine learning techniques to rapidly assess protein functions from sequences. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions (due to complex interactions among proteins). It follows that the instances in the resulting classification problem can carry multiple class labels. We have developed a tree-based classifier that capable of classifying multiply-labeled data and gained an insight into the multi-functional nature of proteins. The algorithm has been used with ensemble methods in connection with other computational intelligence to form a committee machine. Results have been compared favorably to those achieved algorithms such as decision trees and support vector machines