Exploiting PubMed for Protein Molecular Function Prediction via NMF Based Multi-label Classification

S. Fodeh, Aditya Tiwari, Hong Yu
{"title":"Exploiting PubMed for Protein Molecular Function Prediction via NMF Based Multi-label Classification","authors":"S. Fodeh, Aditya Tiwari, Hong Yu","doi":"10.1109/ICDMW.2017.64","DOIUrl":null,"url":null,"abstract":"Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and timeconsuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2017.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and timeconsuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.
利用PubMed基于NMF的多标签分类进行蛋白质分子功能预测
基因本体(Gene ontology, GO)定义了用于描述基因功能及其之间关系的术语和类。氧化石墨烯已成为描述不同模式生物中特定基因功能的标准。用GO术语标记基因的GO注释主要是一个手动且耗时的管理过程。在本文中,我们描述了一个创新的预测系统的开发和评估,以生物医学文献为资源,自动分配基因的分子功能(GO术语)。我们将GO项分配视为一个多标签多类分类问题。与常用的词袋方法不同,我们采用非负矩阵分解(NMF)进行特征约简,然后对基因进行分类。为了解决数据的多标签问题,我们使用了二元相关方法。我们对不同的分类器进行了实验,发现二值相关和k近邻(KNN)分类器的组合具有最好的性能。我们对UniProtKB/Swiss-Prot数据集的评估显示,F-measure的最佳性能为0.83。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信