{"title":"Integrating multi-encoding sequence features via stacking ensemble learning for RNA m5C site prediction.","authors":"Ubaid Ur Rahman, Naeem Ul Islam","doi":"10.1080/15257770.2026.2658190","DOIUrl":null,"url":null,"abstract":"<p><p>RNA 5-methylcytosine (m5C) is an important epitranscriptomic modification involved in RNA stability, translation, and post-transcriptional regulation. Accurate identification of m5C sites remains challenging due to limited sequence representation and insufficient feature integration in existing computational methods. In this study, we propose a comprehensive machine learning framework that integrates six complementary sequence encoding schemes, including enhanced nucleic acid composition (ENAC), tri-nucleotide composition (TNC), composition of K-spaced nucleic acid pairs (CKSNAP), pseudo-electron-ion interaction potential (PseEIIP), one-hot encoding, and nucleotide chemical properties (NCP). Each encoding is paired with an optimal classifier, and a stacking ensemble strategy is employed to fuze the outputs of base classifiers. The model is trained using 5-fold cross-validation for base learners and 3-fold cross-validation for the meta-learner. Performance evaluation using multiple metrics demonstrates that the proposed approach achieves improved robustness and cross-dataset generalization, with an accuracy of 75.5%, MCC of 0.51, and PR-AUC of 0.82. These results indicate that the proposed fusion-based ensemble framework provides an effective and reliable solution for RNA m5C site prediction.</p>","PeriodicalId":19343,"journal":{"name":"Nucleosides, Nucleotides & Nucleic Acids","volume":" ","pages":"1-29"},"PeriodicalIF":1.3000,"publicationDate":"2026-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleosides, Nucleotides & Nucleic Acids","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1080/15257770.2026.2658190","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
RNA 5-methylcytosine (m5C) is an important epitranscriptomic modification involved in RNA stability, translation, and post-transcriptional regulation. Accurate identification of m5C sites remains challenging due to limited sequence representation and insufficient feature integration in existing computational methods. In this study, we propose a comprehensive machine learning framework that integrates six complementary sequence encoding schemes, including enhanced nucleic acid composition (ENAC), tri-nucleotide composition (TNC), composition of K-spaced nucleic acid pairs (CKSNAP), pseudo-electron-ion interaction potential (PseEIIP), one-hot encoding, and nucleotide chemical properties (NCP). Each encoding is paired with an optimal classifier, and a stacking ensemble strategy is employed to fuze the outputs of base classifiers. The model is trained using 5-fold cross-validation for base learners and 3-fold cross-validation for the meta-learner. Performance evaluation using multiple metrics demonstrates that the proposed approach achieves improved robustness and cross-dataset generalization, with an accuracy of 75.5%, MCC of 0.51, and PR-AUC of 0.82. These results indicate that the proposed fusion-based ensemble framework provides an effective and reliable solution for RNA m5C site prediction.
期刊介绍:
Nucleosides, Nucleotides & Nucleic Acids publishes research articles, short notices, and concise, critical reviews of related topics that focus on the chemistry and biology of nucleosides, nucleotides, and nucleic acids.
Complete with experimental details, this all-inclusive journal emphasizes the synthesis, biological activities, new and improved synthetic methods, and significant observations related to new compounds.