RDR100:一种识别类克虏伯因子的有效计算方法

IF 2.9 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2023-09-05 DOI:10.2174/1574893618666230905102407

Adeel Malik, Jamal S. M. Sabir, M. Kamli, Thi Phan Le, Chang-Bae Kim, Balachandran Manavalan

{"title":"RDR100:一种识别类克虏伯因子的有效计算方法","authors":"Adeel Malik, Jamal S. M. Sabir, M. Kamli, Thi Phan Le, Chang-Bae Kim, Balachandran Manavalan","doi":"10.2174/1574893618666230905102407","DOIUrl":null,"url":null,"abstract":"\n\nKrüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.\n\n\n\nIn this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.\n\n\n\nOur results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.\n","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RDR100: an effective computational method for identifying Kruppel-like factors\",\"authors\":\"Adeel Malik, Jamal S. M. Sabir, M. Kamli, Thi Phan Le, Chang-Bae Kim, Balachandran Manavalan\",\"doi\":\"10.2174/1574893618666230905102407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n\\nKrüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.\\n\\n\\n\\nIn this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.\\n\\n\\n\\nOur results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.\\n\",\"PeriodicalId\":10801,\"journal\":{\"name\":\"Current Bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.2174/1574893618666230905102407\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1574893618666230905102407","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

kr ppel样因子(KLFs)是一类含有锌指的转录因子，可调节多种细胞过程。KLF蛋白与人类疾病有关，如癌症、心血管疾病和代谢紊乱。KLF家族由18个成员组成，在许多组织中具有不同的表达谱。考虑到KLF蛋白参与重要的生物学功能，准确的鉴定和注释是至关重要的。虽然实验方法可以精确地鉴定KLF蛋白，但大规模鉴定是复杂、缓慢和昂贵的。在这项研究中，我们开发了RDR100，这是一个基于随机森林(RF)的新型框架，用于根据KLF蛋白的初级序列预测KLF蛋白。首先，我们使用递归特征消除方法确定了十个不同特征的最佳编码，然后使用五种不同的机器学习(ML)分类器训练各自的模型。结果:采用独立数据集对所有模型的性能进行评估，基于交叉验证和独立评估的一致性，最终选择RDR100作为最终模型。我们的研究结果表明，RDR100是KLF蛋白的一个强有力的预测因子。RDR100 web服务器可在https://procarb.org/RDR100/上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RDR100: an effective computational method for identifying Kruppel-like factors

Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive. In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation. Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.