Roman Eisner, B. Poulin, D. Szafron, P. Lu, R. Greiner
{"title":"利用基因本体的层次结构改进蛋白质功能预测","authors":"Roman Eisner, B. Poulin, D. Szafron, P. Lu, R. Greiner","doi":"10.1109/CIBCB.2005.1594940","DOIUrl":null,"url":null,"abstract":"High performance and accurate protein function prediction is an important problem in molecular biology. Many contemporary ontologies, such as Gene Ontology (GO), have a hierarchical structure that can be exploited to improve the prediction accuracy, and lower the computational cost, of protein function prediction. We leverage the hierarchical structure of the ontology in two ways. First, we present a method of creating hierarchy-aware training sets for machine-learned classifiers and we show that, in the case of GO molecular function, it is the most accurate method compared to not considering the hierarchy during training. Second, we use the hierarchy to reduce the computational cost of classification. We also introduce a sound methodology for evaluating hierarchical classifiers using global cross-validation. Biologists often use sequence similarity (e.g. BLAST) to identify a \" nearest neighbor\" sequence and use the database annotations of this neighbor to predict protein function. In these cases, we use the hierarchy to improve accuracy by a small amount. When no similar sequences can be found (which is true for up to 40% of some common proteomes), our technique can improve accuracy by a more significant amount. Although this paper focuses on a specific important application-protein function prediction for the GO hierarchy-the techniques may be applied to any classification problem over a hierarchical ontology.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"103","resultStr":"{\"title\":\"Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology\",\"authors\":\"Roman Eisner, B. Poulin, D. Szafron, P. Lu, R. Greiner\",\"doi\":\"10.1109/CIBCB.2005.1594940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High performance and accurate protein function prediction is an important problem in molecular biology. Many contemporary ontologies, such as Gene Ontology (GO), have a hierarchical structure that can be exploited to improve the prediction accuracy, and lower the computational cost, of protein function prediction. We leverage the hierarchical structure of the ontology in two ways. First, we present a method of creating hierarchy-aware training sets for machine-learned classifiers and we show that, in the case of GO molecular function, it is the most accurate method compared to not considering the hierarchy during training. Second, we use the hierarchy to reduce the computational cost of classification. We also introduce a sound methodology for evaluating hierarchical classifiers using global cross-validation. Biologists often use sequence similarity (e.g. BLAST) to identify a \\\" nearest neighbor\\\" sequence and use the database annotations of this neighbor to predict protein function. In these cases, we use the hierarchy to improve accuracy by a small amount. When no similar sequences can be found (which is true for up to 40% of some common proteomes), our technique can improve accuracy by a more significant amount. Although this paper focuses on a specific important application-protein function prediction for the GO hierarchy-the techniques may be applied to any classification problem over a hierarchical ontology.\",\"PeriodicalId\":330810,\"journal\":{\"name\":\"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"103\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIBCB.2005.1594940\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2005.1594940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology
High performance and accurate protein function prediction is an important problem in molecular biology. Many contemporary ontologies, such as Gene Ontology (GO), have a hierarchical structure that can be exploited to improve the prediction accuracy, and lower the computational cost, of protein function prediction. We leverage the hierarchical structure of the ontology in two ways. First, we present a method of creating hierarchy-aware training sets for machine-learned classifiers and we show that, in the case of GO molecular function, it is the most accurate method compared to not considering the hierarchy during training. Second, we use the hierarchy to reduce the computational cost of classification. We also introduce a sound methodology for evaluating hierarchical classifiers using global cross-validation. Biologists often use sequence similarity (e.g. BLAST) to identify a " nearest neighbor" sequence and use the database annotations of this neighbor to predict protein function. In these cases, we use the hierarchy to improve accuracy by a small amount. When no similar sequences can be found (which is true for up to 40% of some common proteomes), our technique can improve accuracy by a more significant amount. Although this paper focuses on a specific important application-protein function prediction for the GO hierarchy-the techniques may be applied to any classification problem over a hierarchical ontology.