Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong
{"title":"iProm-Yeast:基于ML堆叠的酵母启动子预测工具","authors":"Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong","doi":"10.2174/0115748936256869231019113616","DOIUrl":null,"url":null,"abstract":"Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking\",\"authors\":\"Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong\",\"doi\":\"10.2174/0115748936256869231019113616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.\",\"PeriodicalId\":10801,\"journal\":{\"name\":\"Current Bioinformatics\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.2174/0115748936256869231019113616\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936256869231019113616","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking
Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.