iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking

IF 2.4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS
Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong
{"title":"iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking","authors":"Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong","doi":"10.2174/0115748936256869231019113616","DOIUrl":null,"url":null,"abstract":"Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936256869231019113616","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.
iProm-Yeast:基于ML堆叠的酵母启动子预测工具
背景与目的:基因启动子作为转录起始位点附近的DNA调控元件,在基因转录调控中起着至关重要的作用。尽管有许多方法,包括比对信号和基于内容的启动子预测方法,但由于启动子序列缺乏明确的特征,准确识别启动子仍然具有挑战性。因此,已经提出了许多用于启动子识别的机器学习和深度学习模型,但这些工具的性能并不精确。最近的研究大多集中在鉴定sigma或植物启动子上。而酿酒酵母启动子的准确鉴定仍然是一个未被充分探索的领域。在这项研究中,我们介绍了一种酵母启动子的鉴定方法“iPromyeast”。利用真核酵母的基因组序列,我们研究了载体编码和启动子分类。此外,我们通过使用启动子序列而不是基因组的非启动子区域开发了一个更困难的阴性集。新开发的负重构方法改进了分类,并最大限度地减少了假阳性预测的数量。方法:为了克服与启动子预测相关的问题,我们研究了替代向量编码和特征提取方法。接下来,这些策略与几种机器学习算法和一维卷积神经网络模型相结合。我们的研究结果表明,伪二核苷酸组合更适合用于特征编码,而机器学习叠加方法对于精确的启动子分类是非常好的。此外,我们提供了一种使用启动子序列而不是非启动子区域的负重构方法,从而获得更高的分类性能和更少的假阳性预测。结果:基于5倍交叉验证的结果,所提出的预测因子iProm-Yeast具有很好的检测酿酒酵母启动子的潜力。准确度(Acc)为86.27%,灵敏度(Sn)为82.29%,特异性(Sp)为89.47%,马修斯相关系数(MCC)为0.72,受试者工作特征曲线下面积(AUROC)为0.98。我们还进行了跨物种分析,以确定iProm-Yeast在其他物种中的普遍性。结论:iProm-Yeast是一种准确鉴定酿酒酵母启动子的可靠方法。采用先进的矢量编码技术和负重构方法,提高了分类精度,减少了误报预测。此外,它还为研究人员提供了一个可靠和精确的网站服务器来研究不同生物的基因调控。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Bioinformatics
Current Bioinformatics 生物-生化研究方法
CiteScore
6.60
自引率
2.50%
发文量
77
审稿时长
>12 weeks
期刊介绍: Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信