SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes.

IF 4 Q1 GENETICS & HEREDITY
NAR Genomics and Bioinformatics Pub Date : 2025-01-07 eCollection Date: 2025-03-01 DOI:10.1093/nargab/lqae186
Akshay Khanduja, Debasisa Mohanty
{"title":"SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes.","authors":"Akshay Khanduja, Debasisa Mohanty","doi":"10.1093/nargab/lqae186","DOIUrl":null,"url":null,"abstract":"<p><p>Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins. We have also trained a model for identification of small open reading frame (smORF)-encoded antimicrobial peptides (AMPs). Comprehensive benchmarking of SProtFP revealed an average area under the receiver operator curve (ROC-AUC) of 0.92 during 10-fold cross-validation and an ROC-AUC of 0.94 and 0.93 on held-out balanced and imbalanced test sets. Utilizing our method to annotate bacterial isolates from the human gut microbiome, we could identify thousands of remote homologs of known small protein families and assign putative functions to uncharacterized proteins. This highlights the utility of SProtFP for large-scale functional annotation of microbiome datasets, especially in cases where sequence homology is low. SProtFP is freely available at http://www.nii.ac.in/sprotfp.html and can be combined with genome annotation tools such as ProsmORF-pred to uncover the functional repertoire of novel small proteins in bacteria.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqae186"},"PeriodicalIF":4.0000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704790/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqae186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins. We have also trained a model for identification of small open reading frame (smORF)-encoded antimicrobial peptides (AMPs). Comprehensive benchmarking of SProtFP revealed an average area under the receiver operator curve (ROC-AUC) of 0.92 during 10-fold cross-validation and an ROC-AUC of 0.94 and 0.93 on held-out balanced and imbalanced test sets. Utilizing our method to annotate bacterial isolates from the human gut microbiome, we could identify thousands of remote homologs of known small protein families and assign putative functions to uncharacterized proteins. This highlights the utility of SProtFP for large-scale functional annotation of microbiome datasets, especially in cases where sequence homology is low. SProtFP is freely available at http://www.nii.ac.in/sprotfp.html and can be combined with genome annotation tools such as ProsmORF-pred to uncover the functional repertoire of novel small proteins in bacteria.

SProtFP:基于机器学习的原核生物小orf功能分类方法。
小蛋白质(≤100个氨基酸)在从单细胞细菌到高等生物的所有生命形式中发挥着重要作用。在这项研究中,我们开发了SProtFP,这是一种基于机器学习的方法,用于将原核小蛋白标注为选定的功能类别。SProtFP使用独立的人工神经网络(ann),使用物理化学描述符组合训练,将小蛋白分类为抗毒素2型、细菌素、dna结合、金属结合、核糖体蛋白、rna结合、1型毒素和2型毒素蛋白。我们还训练了一个用于识别小开放阅读框(smORF)编码的抗菌肽(AMPs)的模型。SProtFP的综合基准测试显示,在10倍交叉验证期间,接受者操作曲线下的平均面积(ROC-AUC)为0.92,在平衡和不平衡测试集上的ROC-AUC分别为0.94和0.93。利用我们的方法对人类肠道微生物群中的细菌分离物进行注释,我们可以鉴定出数千个已知小蛋白家族的远程同源物,并为未表征的蛋白分配假定的功能。这突出了SProtFP在微生物组数据集的大规模功能注释中的实用性,特别是在序列同源性较低的情况下。SProtFP可在http://www.nii.ac.in/sprotfp.html免费获得,并可与基因组注释工具(如ProsmORF-pred)结合,以揭示细菌中新型小蛋白的功能库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
95
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信