{"title":"Exploiting PubMed for Protein Molecular Function Prediction via NMF Based Multi-label Classification","authors":"S. Fodeh, Aditya Tiwari, Hong Yu","doi":"10.1109/ICDMW.2017.64","DOIUrl":null,"url":null,"abstract":"Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and timeconsuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2017.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and timeconsuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.