GOstruct 2.0: Automated Protein Function Prediction for Annotated Proteins

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics Pub Date : 2017-08-20 DOI:10.1145/3107411.3107417

Indika Kahanda, A. Ben-Hur

{"title":"GOstruct 2.0: Automated Protein Function Prediction for Annotated Proteins","authors":"Indika Kahanda, A. Ben-Hur","doi":"10.1145/3107411.3107417","DOIUrl":null,"url":null,"abstract":"Automated Protein Function Prediction is the task of automatically predicting functional annotations for a protein based on gold-standard annotations derived from experimental assays. These experiment-based annotations accumulate over time: proteins without annotations get annotated, and new functions of already annotated proteins are discovered. Therefore, function prediction can be considered a combination of two sub-tasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In previous work, we analyzed the performance of several protein function prediction methods in these two scenarios. Our results showed that GOstruct, which is based on the structured output framework, had lower accuracy in the task of predicting annotations for proteins with existing annotations, while its performance on un-annotated proteins was similar to the performance in cross-validation. In this work, we present GOstruct 2.0 which includes improvements that allow the model to make use of information of a protein's current annotations to better handle the task of predicting novel annotations for previously annotated proteins. This is highly important for model organisms where most proteins have some level of annotations. Experimental results on human data show that GOstruct 2.0 outperforms the original GOstruct in this task, demonstrating the effectiveness of the proposed improvements. This is the first study that focuses on adapting the structured output framework for applications in which labels are incomplete by nature.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3107411.3107417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Automated Protein Function Prediction is the task of automatically predicting functional annotations for a protein based on gold-standard annotations derived from experimental assays. These experiment-based annotations accumulate over time: proteins without annotations get annotated, and new functions of already annotated proteins are discovered. Therefore, function prediction can be considered a combination of two sub-tasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In previous work, we analyzed the performance of several protein function prediction methods in these two scenarios. Our results showed that GOstruct, which is based on the structured output framework, had lower accuracy in the task of predicting annotations for proteins with existing annotations, while its performance on un-annotated proteins was similar to the performance in cross-validation. In this work, we present GOstruct 2.0 which includes improvements that allow the model to make use of information of a protein's current annotations to better handle the task of predicting novel annotations for previously annotated proteins. This is highly important for model organisms where most proteins have some level of annotations. Experimental results on human data show that GOstruct 2.0 outperforms the original GOstruct in this task, demonstrating the effectiveness of the proposed improvements. This is the first study that focuses on adapting the structured output framework for applications in which labels are incomplete by nature.

查看原文本刊更多论文

GOstruct 2.0:注释蛋白的自动蛋白质功能预测

自动蛋白质功能预测是基于实验分析得出的金标准注释自动预测蛋白质功能注释的任务。这些基于实验的注释随着时间的推移而积累:没有注释的蛋白质被注释，并且已经注释的蛋白质的新功能被发现。因此，功能预测可以被认为是两个子任务的组合:对已注释的蛋白质进行预测和对以前未注释的蛋白质进行预测。在之前的工作中，我们分析了几种蛋白质功能预测方法在这两种情况下的性能。我们的研究结果表明，基于结构化输出框架的GOstruct在预测具有现有注释的蛋白质的注释任务时准确率较低，而其在未注释的蛋白质上的性能与交叉验证的性能相似。在这项工作中，我们提出了GOstruct 2.0，其中包括改进，允许模型利用蛋白质当前注释的信息来更好地处理预测先前注释的蛋白质的新注释的任务。这对模式生物非常重要，因为大多数蛋白质都有一定程度的注释。在人类数据上的实验结果表明，GOstruct 2.0在此任务中的表现优于原始GOstruct，证明了所提出改进的有效性。这是第一个专注于为标签不完整的应用程序调整结构化输出框架的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

自引率

0.00%

发文量