HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost

IF 4.5 Q1 MICROBIOLOGY
mLife Pub Date : 2024-07-20 DOI:10.1002/mlf2.12125
Shantong Hu, Xiao-Yong Wang, Zhikang Wang, Menghan Jiang, Shihui Wang, Wenya Wang, Jiangning Song, Guimin Zhang
{"title":"HPClas: A data‐driven approach for identifying halophilic proteins based on catBoost","authors":"Shantong Hu, Xiao-Yong Wang, Zhikang Wang, Menghan Jiang, Shihui Wang, Wenya Wang, Jiangning Song, Guimin Zhang","doi":"10.1002/mlf2.12125","DOIUrl":null,"url":null,"abstract":"Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean‐up, and energy production. Generally, halophilic proteins are discovered and characterized through labor‐intensive and time‐consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning‐based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields.","PeriodicalId":94145,"journal":{"name":"mLife","volume":null,"pages":null},"PeriodicalIF":4.5000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mLife","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.1002/mlf2.12125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean‐up, and energy production. Generally, halophilic proteins are discovered and characterized through labor‐intensive and time‐consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning‐based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields.
HPClas:基于 catBoost 的数据驱动型嗜卤蛋白质识别方法
嗜卤蛋白质具有独特的结构特性,在极端条件下表现出高度稳定性。这一显著特点使它们在生物能源、制药、环境清洁和能源生产等各方面的应用变得非常宝贵。一般来说,嗜卤蛋白质的发现和表征需要通过耗费大量人力和时间的湿实验室实验来完成。在本研究中,我们介绍了嗜卤蛋白质分类器(HPClas),这是一种基于机器学习的分类器,采用 catBoost 集合学习技术开发,用于识别嗜卤蛋白质。在一个包含12574个样本的大型公共数据集上进行了广泛的硅计算,在一个包含200个样本的独立测试集上,HPClas的接收者操作特征曲线下面积(AUROC)达到了0.844。HPClas 的源代码和数据集可在 https://github.com/Showmake2/HPClas 上公开获取。总之,HPClas 是一种很有前途的工具,可以帮助鉴定嗜卤蛋白质并加速其在不同领域的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信