基于任务的基因本体大规模评价方法

Salvatore Loguercio, Erik L. Clarke, Benjamin M. Good, A. Su
{"title":"基于任务的基因本体大规模评价方法","authors":"Salvatore Loguercio, Erik L. Clarke, Benjamin M. Good, A. Su","doi":"10.1109/HISB.2012.69","DOIUrl":null,"url":null,"abstract":"The Gene Ontology (GO) provides a framework to systematically classify and annotate gene function. The annotations associated with GO play a critical role in modern biology and cover many organisms. For the human genome, over 10,000 GO terms are used to annotate gene function in an expansive database of over 200,000 annotations. Due to the importance of the GO annotations in modern biology, significant effort has been put into assessing the quality of the annotations. Providing measures of annotation completeness, accuracy, and precision is critical if researchers are to use the annotations in real-world applications with confidence. Here, we describe a task-based approach that examines the completeness and utility of GO annotations through the lens of gene enrichment analysis. Our approach can be used to model the progression of the GO annotations over time, either for a particular area of interest or for the body of annotations as a whole. Using this framework, we conducted a large-scale analysis of gene expression datasets from the NCBI Gene Expression Omnibus (GEO). In particular, we identified terms of interest for each dataset through semantic annotation of biomedical data, then tracked the behavior of these terms as a function of time. The preliminary results provide significant information about the progress and character of GO annotations over time. This framework is flexible enough to examine all or part of the GO annotations, across multiple species, and with various enrichment methods. We also discuss how this framework can be used to evaluate different annotation methods. For example, by comparing the performance of annotations generated with a particular method to the performance of canonical annotations, it is possible to determine their relative quality.","PeriodicalId":375089,"journal":{"name":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Task-Based Approach for Large-Scale Evaluation of the Gene Ontology\",\"authors\":\"Salvatore Loguercio, Erik L. Clarke, Benjamin M. Good, A. Su\",\"doi\":\"10.1109/HISB.2012.69\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Gene Ontology (GO) provides a framework to systematically classify and annotate gene function. The annotations associated with GO play a critical role in modern biology and cover many organisms. For the human genome, over 10,000 GO terms are used to annotate gene function in an expansive database of over 200,000 annotations. Due to the importance of the GO annotations in modern biology, significant effort has been put into assessing the quality of the annotations. Providing measures of annotation completeness, accuracy, and precision is critical if researchers are to use the annotations in real-world applications with confidence. Here, we describe a task-based approach that examines the completeness and utility of GO annotations through the lens of gene enrichment analysis. Our approach can be used to model the progression of the GO annotations over time, either for a particular area of interest or for the body of annotations as a whole. Using this framework, we conducted a large-scale analysis of gene expression datasets from the NCBI Gene Expression Omnibus (GEO). In particular, we identified terms of interest for each dataset through semantic annotation of biomedical data, then tracked the behavior of these terms as a function of time. The preliminary results provide significant information about the progress and character of GO annotations over time. This framework is flexible enough to examine all or part of the GO annotations, across multiple species, and with various enrichment methods. We also discuss how this framework can be used to evaluate different annotation methods. For example, by comparing the performance of annotations generated with a particular method to the performance of canonical annotations, it is possible to determine their relative quality.\",\"PeriodicalId\":375089,\"journal\":{\"name\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HISB.2012.69\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HISB.2012.69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

基因本体(Gene Ontology, GO)提供了一个对基因功能进行系统分类和注释的框架。与氧化石墨烯相关的注释在现代生物学中起着至关重要的作用,涵盖了许多生物体。对于人类基因组,超过10,000个GO术语用于在超过200,000个注释的扩展数据库中注释基因功能。由于GO注释在现代生物学中的重要性,已经投入了大量的努力来评估注释的质量。如果研究人员要在实际应用程序中放心地使用注释,那么提供注释完整性、准确性和精度的度量是至关重要的。在这里,我们描述了一种基于任务的方法,通过基因富集分析的镜头来检查GO注释的完整性和实用性。我们的方法可以用来对GO注释随时间的发展进行建模,无论是针对特定的兴趣区域还是整个注释体。利用这一框架,我们对来自NCBI基因表达综合数据库(GEO)的基因表达数据集进行了大规模分析。特别是,我们通过对生物医学数据的语义注释来确定每个数据集的感兴趣术语,然后跟踪这些术语作为时间函数的行为。初步结果提供了关于GO注释随时间推移的进展和特征的重要信息。该框架具有足够的灵活性,可以跨多个物种和各种富集方法检查全部或部分GO注释。我们还讨论了如何使用这个框架来评估不同的注释方法。例如,通过将特定方法生成的注释的性能与规范注释的性能进行比较,可以确定它们的相对质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Task-Based Approach for Large-Scale Evaluation of the Gene Ontology
The Gene Ontology (GO) provides a framework to systematically classify and annotate gene function. The annotations associated with GO play a critical role in modern biology and cover many organisms. For the human genome, over 10,000 GO terms are used to annotate gene function in an expansive database of over 200,000 annotations. Due to the importance of the GO annotations in modern biology, significant effort has been put into assessing the quality of the annotations. Providing measures of annotation completeness, accuracy, and precision is critical if researchers are to use the annotations in real-world applications with confidence. Here, we describe a task-based approach that examines the completeness and utility of GO annotations through the lens of gene enrichment analysis. Our approach can be used to model the progression of the GO annotations over time, either for a particular area of interest or for the body of annotations as a whole. Using this framework, we conducted a large-scale analysis of gene expression datasets from the NCBI Gene Expression Omnibus (GEO). In particular, we identified terms of interest for each dataset through semantic annotation of biomedical data, then tracked the behavior of these terms as a function of time. The preliminary results provide significant information about the progress and character of GO annotations over time. This framework is flexible enough to examine all or part of the GO annotations, across multiple species, and with various enrichment methods. We also discuss how this framework can be used to evaluate different annotation methods. For example, by comparing the performance of annotations generated with a particular method to the performance of canonical annotations, it is possible to determine their relative quality.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信