Adeel Malik, Jamal S. M. Sabir, M. Kamli, Thi Phan Le, Chang-Bae Kim, Balachandran Manavalan
{"title":"RDR100: an effective computational method for identifying Kruppel-like factors","authors":"Adeel Malik, Jamal S. M. Sabir, M. Kamli, Thi Phan Le, Chang-Bae Kim, Balachandran Manavalan","doi":"10.2174/1574893618666230905102407","DOIUrl":null,"url":null,"abstract":"\n\nKrüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.\n\n\n\nIn this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.\n\n\n\nOur results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.\n","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1574893618666230905102407","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.
In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers. Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.
Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.