通过计算智能自动学习基因组序列

2005 ICSC Congress on Computational Intelligence Methods and Applications Pub Date : 2005-12-15 DOI:10.1109/CIMA.2005.1662321

M.Q. Yang, J.Y. Yang, Zuojie Luo, O. Ersoy

{"title":"通过计算智能自动学习基因组序列","authors":"M.Q. Yang, J.Y. Yang, Zuojie Luo, O. Ersoy","doi":"10.1109/CIMA.2005.1662321","DOIUrl":null,"url":null,"abstract":"Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins lack specific 3D structures and are unconsented during evolution, but play central roles in diseases characterized by protein misfolding and aggregation. Can we assign protein functions to sequences without relying on 3D structures, to provide useful information for the study of diseases? We developed machine learning techniques to rapidly assess protein functions from sequences. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions (due to complex interactions among proteins). It follows that the instances in the resulting classification problem can carry multiple class labels. We have developed a tree-based classifier that capable of classifying multiply-labeled data and gained an insight into the multi-functional nature of proteins. The algorithm has been used with ensemble methods in connection with other computational intelligence to form a committee machine. Results have been compared favorably to those achieved algorithms such as decision trees and support vector machines","PeriodicalId":306045,"journal":{"name":"2005 ICSC Congress on Computational Intelligence Methods and Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated learning of genome sequences by computational intelligence\",\"authors\":\"M.Q. Yang, J.Y. Yang, Zuojie Luo, O. Ersoy\",\"doi\":\"10.1109/CIMA.2005.1662321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins lack specific 3D structures and are unconsented during evolution, but play central roles in diseases characterized by protein misfolding and aggregation. Can we assign protein functions to sequences without relying on 3D structures, to provide useful information for the study of diseases? We developed machine learning techniques to rapidly assess protein functions from sequences. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions (due to complex interactions among proteins). It follows that the instances in the resulting classification problem can carry multiple class labels. We have developed a tree-based classifier that capable of classifying multiply-labeled data and gained an insight into the multi-functional nature of proteins. The algorithm has been used with ensemble methods in connection with other computational intelligence to form a committee machine. Results have been compared favorably to those achieved algorithms such as decision trees and support vector machines\",\"PeriodicalId\":306045,\"journal\":{\"name\":\"2005 ICSC Congress on Computational Intelligence Methods and Applications\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 ICSC Congress on Computational Intelligence Methods and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIMA.2005.1662321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 ICSC Congress on Computational Intelligence Methods and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIMA.2005.1662321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

高通量测序技术的出现导致了对DNA序列数据的探索。已测序的基因组编码的蛋白质序列的结构和功能在很大程度上仍然未知。蛋白质功能和相互作用的自动鉴定在很大程度上依赖于已知的3D结构或序列同源物。特别是，内在的非结构化或无序蛋白质缺乏特定的3D结构，在进化过程中是不被允许的，但在以蛋白质错误折叠和聚集为特征的疾病中起着核心作用。我们能否在不依赖3D结构的情况下为序列分配蛋白质功能，从而为疾病研究提供有用的信息?我们开发了机器学习技术，从序列中快速评估蛋白质功能。由于单个蛋白质可以参与几种不同的途径，因此可以具有多种功能(由于蛋白质之间复杂的相互作用)，因此将功能类别分配给蛋白质的问题变得复杂。由此可见，所得到的分类问题中的实例可以带有多个类标签。我们开发了一个基于树的分类器，能够对多重标记的数据进行分类，并深入了解了蛋白质的多功能特性。该算法已与集成方法结合其他计算智能组成了一个委员会机。结果与那些已实现的算法如决策树和支持向量机进行了比较

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automated learning of genome sequences by computational intelligence

Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins lack specific 3D structures and are unconsented during evolution, but play central roles in diseases characterized by protein misfolding and aggregation. Can we assign protein functions to sequences without relying on 3D structures, to provide useful information for the study of diseases? We developed machine learning techniques to rapidly assess protein functions from sequences. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions (due to complex interactions among proteins). It follows that the instances in the resulting classification problem can carry multiple class labels. We have developed a tree-based classifier that capable of classifying multiply-labeled data and gained an insight into the multi-functional nature of proteins. The algorithm has been used with ensemble methods in connection with other computational intelligence to form a committee machine. Results have been compared favorably to those achieved algorithms such as decision trees and support vector machines

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2005 ICSC Congress on Computational Intelligence Methods and Applications

自引率

0.00%

发文量