Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets.

IF 3.6 3区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
RNA Biology Pub Date : 2024-01-01 Epub Date: 2024-05-17 DOI:10.1080/15476286.2024.2352192
Doaa Hassan, Aditya Ariyur, Swapna Vidhur Daulatabad, Quoseena Mir, Sarath Chandra Janga
{"title":"Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets.","authors":"Doaa Hassan, Aditya Ariyur, Swapna Vidhur Daulatabad, Quoseena Mir, Sarath Chandra Janga","doi":"10.1080/15476286.2024.2352192","DOIUrl":null,"url":null,"abstract":"<p><p>2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.</p>","PeriodicalId":21351,"journal":{"name":"RNA Biology","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11110688/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RNA Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1080/15476286.2024.2352192","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/17 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.

Nm-Nano:纳米孔直接 RNA 测序数据集中 2´-O-methylation (Nm) 位点的全转录组单分子图谱的机器学习框架。
2´-O-甲基化(Nm)是在 mRNA 和非编码 RNA 中发现的最丰富的修饰之一。它有助于许多生物过程,如 tRNA 的正常功能、保护 mRNA 免受脱帽和外切核酸酶(DXO)蛋白的降解,以及 rRNA 的生物生成和特异性。牛津纳米孔技术提供的用于长读 RNA 测序数据的单分子测序技术的最新进展使我们能够从测序数据中直接检测 RNA 的修饰。在本研究中,我们提出了一个生物计算框架 Nm-Nano,用于预测从两个人类细胞系生成的直接 RNA 测序数据中是否存在 Nm 位点。Nm-Nano 框架整合了两个用于预测 Nm 位点的监督机器学习(ML)模型:Extreme Gradient Boosting (XGBoost) 和带有 K-mer embedding 的随机森林 (RF)。在对 HeLa 和 HEK293 细胞系进行直接 RNA 测序的基准数据集上进行的评估表明,Nm 位点的识别准确率很高(XGBoost 为 99%,RF 为 92%)。在 HeLa 和 HEK293 细胞系中部署 Nm-Nano,可以发现经常被 Nm 修饰的基因。在 HeLa 细胞系中,125 个基因被确定为经常被 Nm 修饰,在与免疫反应和细胞过程有关的 30 个本体中显示出富集。在 HEK293 细胞系中,发现 61 个基因经常被 Nm 修饰,富集在糖酵解和蛋白质定位等过程中。这些发现强调了 Nm 修饰在代谢途径、蛋白质降解和细胞过程中的多种调控作用。Nm-Nano 的源代码可在 https://github.com/Janga-Lab/Nm-Nano 免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
RNA Biology
RNA Biology 生物-生化与分子生物学
CiteScore
8.60
自引率
0.00%
发文量
82
审稿时长
1 months
期刊介绍: RNA has played a central role in all cellular processes since the beginning of life: decoding the genome, regulating gene expression, mediating molecular interactions, catalyzing chemical reactions. RNA Biology, as a leading journal in the field, provides a platform for presenting and discussing cutting-edge RNA research. RNA Biology brings together a multidisciplinary community of scientists working in the areas of: Transcription and splicing Post-transcriptional regulation of gene expression Non-coding RNAs RNA localization Translation and catalysis by RNA Structural biology Bioinformatics RNA in disease and therapy
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信