利用高通量测序数据增强r环预测。

IF 2.8 Q1 GENETICS & HEREDITY
NAR Genomics and Bioinformatics Pub Date : 2025-06-11 eCollection Date: 2025-06-01 DOI:10.1093/nargab/lqaf077
Thomas Vanhaeren, Ludovica Cataneo, Federico Divina, Pedro Manuel Martínez-García
{"title":"利用高通量测序数据增强r环预测。","authors":"Thomas Vanhaeren, Ludovica Cataneo, Federico Divina, Pedro Manuel Martínez-García","doi":"10.1093/nargab/lqaf077","DOIUrl":null,"url":null,"abstract":"<p><p>R-loops are three-stranded RNA and DNA hybrid structures that often occur in the genome and play important roles in a variety of cellular processes from bacteria to mammals. Sequencing methods profiling R-loops genome-wide have revealed that they can form co-transcriptionally at cell type specific genes and associate with specific chromatin states during cell differentiation and reprogramming. However, current computational methods for the prediction of R-loops rely solely on their DNA sequence properties, which precludes detection across cell types, tissues or developmental stages. Here, we conduct a machine learning approach that allows the prediction of mammalian cell type-specific R-loops using sequence information and high-throughput sequencing signals. Our predictive models are induced from human samples and achieve highly accurate predictions, with transcriptomics, DNA features, chromatin accessibility and the active gene body H3K36me3 epigenomic mark being the most informative datasets. We generate <i>de novo</i> virtual R-loop maps that show high concordance with experimental ones and capture cell type specificity. Our approach compares favorably to sequence-based methods and can be generalized to mouse datasets. Based on this, we generate virtual R-loop maps in 51 mammalian systems that are freely accessible to the scientific community.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf077"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153340/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing R-loop prediction with high-throughput sequencing data.\",\"authors\":\"Thomas Vanhaeren, Ludovica Cataneo, Federico Divina, Pedro Manuel Martínez-García\",\"doi\":\"10.1093/nargab/lqaf077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>R-loops are three-stranded RNA and DNA hybrid structures that often occur in the genome and play important roles in a variety of cellular processes from bacteria to mammals. Sequencing methods profiling R-loops genome-wide have revealed that they can form co-transcriptionally at cell type specific genes and associate with specific chromatin states during cell differentiation and reprogramming. However, current computational methods for the prediction of R-loops rely solely on their DNA sequence properties, which precludes detection across cell types, tissues or developmental stages. Here, we conduct a machine learning approach that allows the prediction of mammalian cell type-specific R-loops using sequence information and high-throughput sequencing signals. Our predictive models are induced from human samples and achieve highly accurate predictions, with transcriptomics, DNA features, chromatin accessibility and the active gene body H3K36me3 epigenomic mark being the most informative datasets. We generate <i>de novo</i> virtual R-loop maps that show high concordance with experimental ones and capture cell type specificity. Our approach compares favorably to sequence-based methods and can be generalized to mouse datasets. Based on this, we generate virtual R-loop maps in 51 mammalian systems that are freely accessible to the scientific community.</p>\",\"PeriodicalId\":33994,\"journal\":{\"name\":\"NAR Genomics and Bioinformatics\",\"volume\":\"7 2\",\"pages\":\"lqaf077\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153340/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NAR Genomics and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/nargab/lqaf077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqaf077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

r环是一种三链RNA和DNA杂交结构,经常出现在基因组中,在从细菌到哺乳动物的各种细胞过程中发挥重要作用。分析全基因组r环的测序方法表明,它们可以在细胞类型特异性基因上形成共转录,并在细胞分化和重编程过程中与特定的染色质状态相关。然而,目前预测r环的计算方法仅仅依赖于它们的DNA序列特性,这就排除了跨细胞类型、组织或发育阶段的检测。在这里,我们进行了一种机器学习方法,允许使用序列信息和高通量测序信号预测哺乳动物细胞类型特异性r环。我们的预测模型是从人类样本中诱导出来的,并实现了高度准确的预测,其中转录组学、DNA特征、染色质可及性和活性基因体H3K36me3表观基因组标记是信息量最大的数据集。我们生成了从头开始的虚拟r -环路图,显示出与实验图的高度一致性,并捕获细胞类型特异性。我们的方法优于基于序列的方法,并且可以推广到鼠标数据集。在此基础上,我们生成了51个哺乳动物系统的虚拟R-loop地图,这些地图可以免费向科学界开放。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing R-loop prediction with high-throughput sequencing data.

R-loops are three-stranded RNA and DNA hybrid structures that often occur in the genome and play important roles in a variety of cellular processes from bacteria to mammals. Sequencing methods profiling R-loops genome-wide have revealed that they can form co-transcriptionally at cell type specific genes and associate with specific chromatin states during cell differentiation and reprogramming. However, current computational methods for the prediction of R-loops rely solely on their DNA sequence properties, which precludes detection across cell types, tissues or developmental stages. Here, we conduct a machine learning approach that allows the prediction of mammalian cell type-specific R-loops using sequence information and high-throughput sequencing signals. Our predictive models are induced from human samples and achieve highly accurate predictions, with transcriptomics, DNA features, chromatin accessibility and the active gene body H3K36me3 epigenomic mark being the most informative datasets. We generate de novo virtual R-loop maps that show high concordance with experimental ones and capture cell type specificity. Our approach compares favorably to sequence-based methods and can be generalized to mouse datasets. Based on this, we generate virtual R-loop maps in 51 mammalian systems that are freely accessible to the scientific community.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
95
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信