从直接 RNA 测序数据中检测和量化 5moU RNA 修饰

IF 1.4 4区生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Current Genomics Pub Date : 2024-04-17 DOI:10.2174/0113892029288843240402042529

Jiayi Li, Feiyang Sun, Kunyang He, Lin Zhang, Jia Meng, Daiyun Huang, Yuxin Zhang

{"title":"从直接 RNA 测序数据中检测和量化 5moU RNA 修饰","authors":"Jiayi Li, Feiyang Sun, Kunyang He, Lin Zhang, Jia Meng, Daiyun Huang, Yuxin Zhang","doi":"10.2174/0113892029288843240402042529","DOIUrl":null,"url":null,"abstract":"Background: Chemically modified therapeutic mRNAs have gained momentum recently. In addition to commonly used modifications (e.g., pseudouridine), 5moU is considered a promising substitution for uridine in therapeutic mRNAs. Accurate identification of 5-methoxyuridine (5moU) would be crucial for the study and quality control of relevant in vitro-transcribed (IVT) mRNAs. However, current methods exhibit deficiencies in providing quantitative methodologies for detecting such modification. Utilizing the capabilities of Oxford nanopore direct RNA sequencing, in this study, we present NanoML-5moU, a machine-learning framework designed specifically for the read-level detection and quantification of 5moU modification for IVT data. Method: Nanopore direct RNA sequencing data from both 5moU-modified and unmodified control samples were collected. Subsequently, a comprehensive analysis and modeling of signal event characteristics (mean, median current intensities, standard deviations, and dwell times) were performed. Furthermore, classical machine learning algorithms, notably the Support Vector Machine (SVM), Random Forest (RF), and XGBoost were employed to discern 5moU modifications within NNUNN (where N represents A, C, U, or G) 5-mers. Result: Notably, the signal event attributes pertaining to each constituent base of the NNUNN 5-mers, in conjunction with the utilization of the XGBoost algorithm, exhibited remarkable performance levels (with a maximum AUROC of 0.9567 in the \"AGTTC\" reference 5-mer dataset and a minimum AUROC of 0.8113 in the \"TGTGC\" reference 5-mer dataset). This accomplishment markedly exceeded the efficacy of the prevailing background error comparison model (ELIGOs AUC 0.751 for site-level prediction). The model's performance was further validated through a series of curated datasets, which featured customized modification ratios designed to emulate broader data patterns, demonstrating its general applicability in quality control of IVT mRNA vaccines. The NanoML-5moU framework is publicly available on GitHub (https://github.com/JiayiLi21/Nano ML-5moU). Conclusion: NanoML-5moU enables accurate read-level profiling of 5moU modification with nanopore direct RNA-sequencing, which is a powerful tool specialized in unveiling signal patterns in in vitro-transcribed (IVT) mRNAs.","PeriodicalId":10803,"journal":{"name":"Current Genomics","volume":"27 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection and Quantification of 5moU RNA Modification from Direct RNA Sequencing Data\",\"authors\":\"Jiayi Li, Feiyang Sun, Kunyang He, Lin Zhang, Jia Meng, Daiyun Huang, Yuxin Zhang\",\"doi\":\"10.2174/0113892029288843240402042529\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Chemically modified therapeutic mRNAs have gained momentum recently. In addition to commonly used modifications (e.g., pseudouridine), 5moU is considered a promising substitution for uridine in therapeutic mRNAs. Accurate identification of 5-methoxyuridine (5moU) would be crucial for the study and quality control of relevant in vitro-transcribed (IVT) mRNAs. However, current methods exhibit deficiencies in providing quantitative methodologies for detecting such modification. Utilizing the capabilities of Oxford nanopore direct RNA sequencing, in this study, we present NanoML-5moU, a machine-learning framework designed specifically for the read-level detection and quantification of 5moU modification for IVT data. Method: Nanopore direct RNA sequencing data from both 5moU-modified and unmodified control samples were collected. Subsequently, a comprehensive analysis and modeling of signal event characteristics (mean, median current intensities, standard deviations, and dwell times) were performed. Furthermore, classical machine learning algorithms, notably the Support Vector Machine (SVM), Random Forest (RF), and XGBoost were employed to discern 5moU modifications within NNUNN (where N represents A, C, U, or G) 5-mers. Result: Notably, the signal event attributes pertaining to each constituent base of the NNUNN 5-mers, in conjunction with the utilization of the XGBoost algorithm, exhibited remarkable performance levels (with a maximum AUROC of 0.9567 in the \\\"AGTTC\\\" reference 5-mer dataset and a minimum AUROC of 0.8113 in the \\\"TGTGC\\\" reference 5-mer dataset). This accomplishment markedly exceeded the efficacy of the prevailing background error comparison model (ELIGOs AUC 0.751 for site-level prediction). The model's performance was further validated through a series of curated datasets, which featured customized modification ratios designed to emulate broader data patterns, demonstrating its general applicability in quality control of IVT mRNA vaccines. The NanoML-5moU framework is publicly available on GitHub (https://github.com/JiayiLi21/Nano ML-5moU). Conclusion: NanoML-5moU enables accurate read-level profiling of 5moU modification with nanopore direct RNA-sequencing, which is a powerful tool specialized in unveiling signal patterns in in vitro-transcribed (IVT) mRNAs.\",\"PeriodicalId\":10803,\"journal\":{\"name\":\"Current Genomics\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.2174/0113892029288843240402042529\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0113892029288843240402042529","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：经过化学修饰的治疗用 mRNA 近来发展势头迅猛。除了常用的修饰（如假尿苷）外，5moU 被认为是治疗 mRNA 中尿苷的一种有前途的替代物。5-methoxyuridine (5moU) 的准确鉴定对于相关体外转录（IVT）mRNA 的研究和质量控制至关重要。然而，目前的方法在提供定量检测这种修饰的方法方面存在缺陷。在本研究中，我们利用牛津纳米孔直接 RNA 测序的功能，提出了 NanoML-5moU，这是一个机器学习框架，专门用于对 IVT 数据的 5moU 修饰进行读数级检测和定量。方法收集 5moU 修饰和未修改对照样本的 Nanopore 直接 RNA 测序数据。随后，对信号事件特征（平均值、中值电流强度、标准偏差和停留时间）进行了综合分析和建模。此外，研究人员还采用了经典的机器学习算法，特别是支持向量机（SVM）、随机森林（RF）和 XGBoost，来识别 NNUNN（其中 N 代表 A、C、U 或 G）5-mers 中的 5moU 修饰。结果值得注意的是，与 NNUNN 5-聚合体各组成基相关的信号事件属性，结合 XGBoost 算法的使用，表现出了卓越的性能水平（"AGTTC "参考 5-聚合体数据集的最大 AUROC 为 0.9567，"TGTGC "参考 5-聚合体数据集的最小 AUROC 为 0.8113）。这一成绩明显超过了现有的背景误差比较模型（ELIGOs AUC 0.751，用于位点级预测）。该模型的性能通过一系列策划数据集得到了进一步验证，这些数据集具有定制的修饰比率，旨在模仿更广泛的数据模式，证明了其在 IVT mRNA 疫苗质量控制中的普遍适用性。NanoML-5moU 框架在 GitHub 上公开发布（https://github.com/JiayiLi21/Nano ML-5moU）。结论NanoML-5moU 可通过纳米孔直接 RNA 测序对 5moU 修饰进行精确的读数级剖析，是专门揭示体外转录 (IVT) mRNA 信号模式的强大工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detection and Quantification of 5moU RNA Modification from Direct RNA Sequencing Data

Background: Chemically modified therapeutic mRNAs have gained momentum recently. In addition to commonly used modifications (e.g., pseudouridine), 5moU is considered a promising substitution for uridine in therapeutic mRNAs. Accurate identification of 5-methoxyuridine (5moU) would be crucial for the study and quality control of relevant in vitro-transcribed (IVT) mRNAs. However, current methods exhibit deficiencies in providing quantitative methodologies for detecting such modification. Utilizing the capabilities of Oxford nanopore direct RNA sequencing, in this study, we present NanoML-5moU, a machine-learning framework designed specifically for the read-level detection and quantification of 5moU modification for IVT data. Method: Nanopore direct RNA sequencing data from both 5moU-modified and unmodified control samples were collected. Subsequently, a comprehensive analysis and modeling of signal event characteristics (mean, median current intensities, standard deviations, and dwell times) were performed. Furthermore, classical machine learning algorithms, notably the Support Vector Machine (SVM), Random Forest (RF), and XGBoost were employed to discern 5moU modifications within NNUNN (where N represents A, C, U, or G) 5-mers. Result: Notably, the signal event attributes pertaining to each constituent base of the NNUNN 5-mers, in conjunction with the utilization of the XGBoost algorithm, exhibited remarkable performance levels (with a maximum AUROC of 0.9567 in the "AGTTC" reference 5-mer dataset and a minimum AUROC of 0.8113 in the "TGTGC" reference 5-mer dataset). This accomplishment markedly exceeded the efficacy of the prevailing background error comparison model (ELIGOs AUC 0.751 for site-level prediction). The model's performance was further validated through a series of curated datasets, which featured customized modification ratios designed to emulate broader data patterns, demonstrating its general applicability in quality control of IVT mRNA vaccines. The NanoML-5moU framework is publicly available on GitHub (https://github.com/JiayiLi21/Nano ML-5moU). Conclusion: NanoML-5moU enables accurate read-level profiling of 5moU modification with nanopore direct RNA-sequencing, which is a powerful tool specialized in unveiling signal patterns in in vitro-transcribed (IVT) mRNAs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Current Genomics 生物-生化与分子生物学

CiteScore

5.20

自引率

0.00%

发文量

审稿时长

>0 weeks

期刊介绍： Current Genomics is a peer-reviewed journal that provides essential reading about the latest and most important developments in genome science and related fields of research. Systems biology, systems modeling, machine learning, network inference, bioinformatics, computational biology, epigenetics, single cell genomics, extracellular vesicles, quantitative biology, and synthetic biology for the study of evolution, development, maintenance, aging and that of human health, human diseases, clinical genomics and precision medicine are topics of particular interest. The journal covers plant genomics. The journal will not consider articles dealing with breeding and livestock. Current Genomics publishes three types of articles including: i) Research papers from internationally-recognized experts reporting on new and original data generated at the genome scale level. Position papers dealing with new or challenging methodological approaches, whether experimental or mathematical, are greatly welcome in this section. ii) Authoritative and comprehensive full-length or mini reviews from widely recognized experts, covering the latest developments in genome science and related fields of research such as systems biology, statistics and machine learning, quantitative biology, and precision medicine. Proposals for mini-hot topics (2-3 review papers) and full hot topics (6-8 review papers) guest edited by internationally-recognized experts are welcome in this section. Hot topic proposals should not contain original data and they should contain articles originating from at least 2 different countries. iii) Opinion papers from internationally recognized experts addressing contemporary questions and issues in the field of genome science and systems biology and basic and clinical research practices.