{"title":"MLm5C:基于混合机器学习模型组合的高精度人类 RNA 5-甲基胞嘧啶位点预测器。","authors":"Hiroyuki Kurata , Md Harun-Or-Roshid , Md Mehedi Hasan , Sho Tsukiyama , Kazuhiro Maeda , Balachandran Manavalan","doi":"10.1016/j.ymeth.2024.05.004","DOIUrl":null,"url":null,"abstract":"<div><p>RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer’s disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"227 ","pages":"Pages 37-47"},"PeriodicalIF":4.2000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models\",\"authors\":\"Hiroyuki Kurata , Md Harun-Or-Roshid , Md Mehedi Hasan , Sho Tsukiyama , Kazuhiro Maeda , Balachandran Manavalan\",\"doi\":\"10.1016/j.ymeth.2024.05.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer’s disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.</p></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"227 \",\"pages\":\"Pages 37-47\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202324001142\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001142","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models
RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer’s disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.