Thomas Vanhaeren, Ludovica Cataneo, Federico Divina, Pedro Manuel Martínez-García
{"title":"利用高通量测序数据增强r环预测。","authors":"Thomas Vanhaeren, Ludovica Cataneo, Federico Divina, Pedro Manuel Martínez-García","doi":"10.1093/nargab/lqaf077","DOIUrl":null,"url":null,"abstract":"<p><p>R-loops are three-stranded RNA and DNA hybrid structures that often occur in the genome and play important roles in a variety of cellular processes from bacteria to mammals. Sequencing methods profiling R-loops genome-wide have revealed that they can form co-transcriptionally at cell type specific genes and associate with specific chromatin states during cell differentiation and reprogramming. However, current computational methods for the prediction of R-loops rely solely on their DNA sequence properties, which precludes detection across cell types, tissues or developmental stages. Here, we conduct a machine learning approach that allows the prediction of mammalian cell type-specific R-loops using sequence information and high-throughput sequencing signals. Our predictive models are induced from human samples and achieve highly accurate predictions, with transcriptomics, DNA features, chromatin accessibility and the active gene body H3K36me3 epigenomic mark being the most informative datasets. We generate <i>de novo</i> virtual R-loop maps that show high concordance with experimental ones and capture cell type specificity. Our approach compares favorably to sequence-based methods and can be generalized to mouse datasets. Based on this, we generate virtual R-loop maps in 51 mammalian systems that are freely accessible to the scientific community.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 2","pages":"lqaf077"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153340/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing R-loop prediction with high-throughput sequencing data.\",\"authors\":\"Thomas Vanhaeren, Ludovica Cataneo, Federico Divina, Pedro Manuel Martínez-García\",\"doi\":\"10.1093/nargab/lqaf077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>R-loops are three-stranded RNA and DNA hybrid structures that often occur in the genome and play important roles in a variety of cellular processes from bacteria to mammals. Sequencing methods profiling R-loops genome-wide have revealed that they can form co-transcriptionally at cell type specific genes and associate with specific chromatin states during cell differentiation and reprogramming. However, current computational methods for the prediction of R-loops rely solely on their DNA sequence properties, which precludes detection across cell types, tissues or developmental stages. Here, we conduct a machine learning approach that allows the prediction of mammalian cell type-specific R-loops using sequence information and high-throughput sequencing signals. Our predictive models are induced from human samples and achieve highly accurate predictions, with transcriptomics, DNA features, chromatin accessibility and the active gene body H3K36me3 epigenomic mark being the most informative datasets. We generate <i>de novo</i> virtual R-loop maps that show high concordance with experimental ones and capture cell type specificity. Our approach compares favorably to sequence-based methods and can be generalized to mouse datasets. Based on this, we generate virtual R-loop maps in 51 mammalian systems that are freely accessible to the scientific community.</p>\",\"PeriodicalId\":33994,\"journal\":{\"name\":\"NAR Genomics and Bioinformatics\",\"volume\":\"7 2\",\"pages\":\"lqaf077\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153340/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NAR Genomics and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/nargab/lqaf077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqaf077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Enhancing R-loop prediction with high-throughput sequencing data.
R-loops are three-stranded RNA and DNA hybrid structures that often occur in the genome and play important roles in a variety of cellular processes from bacteria to mammals. Sequencing methods profiling R-loops genome-wide have revealed that they can form co-transcriptionally at cell type specific genes and associate with specific chromatin states during cell differentiation and reprogramming. However, current computational methods for the prediction of R-loops rely solely on their DNA sequence properties, which precludes detection across cell types, tissues or developmental stages. Here, we conduct a machine learning approach that allows the prediction of mammalian cell type-specific R-loops using sequence information and high-throughput sequencing signals. Our predictive models are induced from human samples and achieve highly accurate predictions, with transcriptomics, DNA features, chromatin accessibility and the active gene body H3K36me3 epigenomic mark being the most informative datasets. We generate de novo virtual R-loop maps that show high concordance with experimental ones and capture cell type specificity. Our approach compares favorably to sequence-based methods and can be generalized to mouse datasets. Based on this, we generate virtual R-loop maps in 51 mammalian systems that are freely accessible to the scientific community.