Genome researchPub Date : 2025-07-22DOI: 10.1101/gr.280175.124
Simone Andrea Biagini, Sara Becelaere, Mio Aerden, Tatjana Jatsenko, Laurens Hannes, Philip Van Damme, Jeroen Breckpot, Koenraad Devriendt, Bernard Thienpont, Joris Robert Vermeesch, Isabelle Cleynen, Toomas Kivisild
{"title":"Genotype imputation from low-coverage data for medical and population genetic analyses","authors":"Simone Andrea Biagini, Sara Becelaere, Mio Aerden, Tatjana Jatsenko, Laurens Hannes, Philip Van Damme, Jeroen Breckpot, Koenraad Devriendt, Bernard Thienpont, Joris Robert Vermeesch, Isabelle Cleynen, Toomas Kivisild","doi":"10.1101/gr.280175.124","DOIUrl":"https://doi.org/10.1101/gr.280175.124","url":null,"abstract":"Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome wide sequences (LPS) from noninvasive prenatal screening (NIPS) with an average autosomal sequence depth of ~0.15×. In scenarios involving ultra-low coverage sequences, conventional approaches to enhance accuracy may fail, especially when multiple samples are pooled. To enhance the proportion of high-quality genotypes in large datasets we introduce a filtering approach called GDI that combines genotype probability (GP), alternate allele dosage (DS), and INFO score filters. We demonstrate that imputation tools QUILT and GLIMPSE2 achieve similar accuracy, which is high enough for broad-scale ancestry mapping but insufficient for high resolution Principal Component Analysis (PCA), when applied without filters. With the GDI approach we can achieve quality that is adequate for such purposes. Furthermore, we explored the impact of imputation errors, choice of variants and filtering methods on PGS prediction for height in 1,911 subjects with height data. We show that polygenic scores predict 23.7% of variance in height in our imputed data and that, contrary to the effect on PCA, the GDI filter does not improve the performance of PGS in height prediction. These results highlight that imputed LPS data can be leveraged for further biomedical and population genetic use but there is a need to consider each downstream analysis tool individually for its imputation quality thresholds and filtering requirements.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"16 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144684458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-22DOI: 10.1101/gr.280480.125
Hsin-Yen Larry Wu, Isaiah D. Kaufman, Polly Yingshan Hsu
{"title":"The ggRibo single-gene viewer reveals insights into translatome and other nucleotide-resolution omics data","authors":"Hsin-Yen Larry Wu, Isaiah D. Kaufman, Polly Yingshan Hsu","doi":"10.1101/gr.280480.125","DOIUrl":"https://doi.org/10.1101/gr.280480.125","url":null,"abstract":"Visualizing Ribo-seq and other sequencing data within genes of interest is a powerful approach to studying gene expression, but its application is limited by a lack of robust tools. Here, we introduce ggRibo, a user-friendly R package for visualizing individual gene expression, integrating Ribo-seq, RNA-seq, and other genome-wide datasets with flexible scaling options. ggRibo visualizes 3-nucleotide periodicity, a hallmark of translating ribosomes, within a gene-structure context, including introns and untranslated regions, enabling the study of novel ORFs, translation of different isoforms, and mechanisms of translational regulation. ggRibo can plot multiple Ribo-seq/RNA-seq datasets from different conditions for comparison. It also contains functions for plotting single-transcript view, reading-frame decomposition, and RNA-seq coverage alone. Importantly, ggRibo supports the visualization of other omics datasets that could also be presented with single-nucleotide resolution, such as RNA degradome, transcription start sites, translation initiation sites, and epitranscriptomic modifications. We demonstrate its utility with examples of upstream ORFs, downstream ORFs, nested ORFs, and differential isoform translation in humans, <em>Arabidopsis</em>, tomato, and rice. We also provide examples of multiomic comparisons that reveal insights that connect the transcriptome, translatome, and degradome. In summary, ggRibo is an advanced single-gene viewer that offers a valuable resource for studying gene expression regulation through its intuitive and flexible platform.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"16 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144684459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene coexpression networks","authors":"Prashanthi Ravichandran, Princy Parsana, Rebecca Keener, Kasper Hansen, Alexis Battle","doi":"10.1101/gr.280808.125","DOIUrl":"https://doi.org/10.1101/gr.280808.125","url":null,"abstract":"Gene coexpression networks (GCNs) describe relationships among genes that maintain cellular identity and homeostasis. However, typical RNA-seq experiments often lack sufficient sample sizes for reliable GCN inference. Recount3, a dataset with 316,443 processed human RNA-seq samples, provides an opportunity to improve network reconstruction. However, GCN inference from public data is challenged by confounders and inconsistent labeling. To address this, we developed a pipeline to annotate samples based on cell type composition. By comparing aggregation strategies, we found that regressing confounders within studies and prioritizing larger studies optimized network reconstruction. We applied these findings to infer three consensus networks (universal, cancer, non-cancer) and 27 context-specific networks. Central genes in consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, while context-specific central nodes included tissue-specific transcription factors. The increased statistical power from data aggregation facilitated the derivation of variant annotations from context-specific networks, which were significantly enriched for complex-trait heritability independent of overlap with baseline functional genomic annotations. While data aggregation led to strictly increasing held-out log-likelihood, we observed diminishing marginal improvements, suggesting that integrating complementary modalities, such as Hi-C and ChIP-seq, could further refine network reconstruction. Our approach outlines best practices for GCN inference and highlights both the strengths and limitations of data aggregation.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"12 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144652093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-17DOI: 10.1101/gr.279501.124
Irene Mei, Susanne Nichterwitz, Melanie Leboeuf, Jik Nijssen, Isadora lenoel, Dirk Repsilber, Christian S Lobsiger, Eva Hedlund
{"title":"Transcriptional modulation unique to vulnerable motor neurons predicts ALS across species and SOD1 mutations","authors":"Irene Mei, Susanne Nichterwitz, Melanie Leboeuf, Jik Nijssen, Isadora lenoel, Dirk Repsilber, Christian S Lobsiger, Eva Hedlund","doi":"10.1101/gr.279501.124","DOIUrl":"https://doi.org/10.1101/gr.279501.124","url":null,"abstract":"Amyotrophic lateral sclerosis (ALS) is characterized by the progressive loss of motor neurons (MNs) that innervate skeletal muscles. However, certain MN groups including ocular MNs, are relatively resilient. To reveal key drivers of resilience versus vulnerability in ALS, we investigate the transcriptional dynamics of four distinct MN populations in SOD1G93A ALS mice using LCM-seq and single molecule fluorescent in situ hybridization. We find that resilient ocular MNs regulate few genes in response to disease. Instead, they exhibit high baseline gene expression of neuroprotective factors including En1, Pvalb, Cd63 and Gal, some of which vulnerable MNs upregulate during disease. Vulnerable motor neuron groups upregulate both detrimental and regenerative responses to ALS and share pathway activation, indicating that breakdown occurs through similar mechanisms across vulnerable neurons, albeit with distinct timing. Meta-analysis across four rodent mutant SOD1 MN transcriptome datasets identify a shared vulnerability code of 39 genes including <em>Atf4</em>, <em>Nupr1</em>, <em>Ddit3</em>, and <em>Penk</em>, involved in apoptosis as well as proregenerative and anti-apoptotic signature consisting of <em>Atf3</em>, <em>Vgf</em>, <em>Ina</em>, <em>Sprr1a</em>, <em>Fgf21</em>, <em>Gap43</em>, <em>Adcyap1</em>, and <em>Mt1</em>. Machine learning using genes upregulated in SOD1G93A spinal MN predicts disease in human stem cell-derived SOD1E100G MNs, and shows that dysregulation of <em>VGF</em>, <em>INA</em>, and <em>PENK</em> are strong disease-predictors across species and SOD1 mutations. Our study reveals MN population-specific gene expression and temporal disease-induced regulation that together provide a basis to explain ALS selective vulnerability and resilience and that can be used to predict disease.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"24 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144652092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-16DOI: 10.1101/gr.280728.125
Harsh G. Shukla, Mahul Chakraborty, J.J. Emerson
{"title":"Genetic variation in recalcitrant repetitive regions of the Drosophila melanogaster genome","authors":"Harsh G. Shukla, Mahul Chakraborty, J.J. Emerson","doi":"10.1101/gr.280728.125","DOIUrl":"https://doi.org/10.1101/gr.280728.125","url":null,"abstract":"Many essential functions of organisms are encoded in highly repetitive genomic regions, including histones involved in DNA packaging, centromeres that are core components of chromosome segregation, ribosomal RNA comprising the protein translation machinery, telomeres that ensure chromosome integrity, piRNA clusters encoding host defenses against selfish elements, and virtually the entire Y Chromosome. These regions, formed by highly similar tandem arrays, pose significant challenges for experimental and computational studies, impeding sequence-level descriptions essential for understanding genetic variation. Here, we report the assembly and variation analysis of such repetitive regions in <em>Drosophila melanogaster</em>, offering significant improvements to the existing community reference assembly. Our work successfully recovers previously elusive segments, including complete reconstructions of the histone locus and the pericentric heterochromatin of the X Chromosome, spanning the Stellate locus to the distal flank of the rDNA cluster. To infer structural changes in these regions where alignments are often not practicable, we introduce landmark anchors based on unique variants that are putatively orthologous. These regions display considerable structural variation between different <em>D. melanogaster</em> strains, exhibiting differences in copy number and organization of homologous repeat units between haplotypes. In the histone cluster, although we observe minimal genetic exchange indicative of meiotic crossing over, the variation patterns suggest mechanisms such as unequal sister chromatid exchange. We also examine the prevalence and scale of concerted evolution in the histone and Stellate clusters and discuss the mechanisms underlying these observed patterns.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"94 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-15DOI: 10.1101/gr.280304.124
Gabriela D. A. Guardia, Carlos H dos Anjos, Aline Rangel-Pozzo, Filipe F dos Santos, Alexander Birbrair, Paula F Asprino, Anamaria A Camargo, Pedro A F Galante
{"title":"Alternative splicing generates HER2 isoform diversity underlying antibody-drug conjugate resistance in breast cancer","authors":"Gabriela D. A. Guardia, Carlos H dos Anjos, Aline Rangel-Pozzo, Filipe F dos Santos, Alexander Birbrair, Paula F Asprino, Anamaria A Camargo, Pedro A F Galante","doi":"10.1101/gr.280304.124","DOIUrl":"https://doi.org/10.1101/gr.280304.124","url":null,"abstract":"Breast cancer (BC) is a heterogeneous disease that can be molecularly classified based on the expression of the ERBB2 receptor (also known as HER2) and hormone receptors. Targeted therapies for HER2-positive BC, such as trastuzumab, antibody-drug conjugates (ADCs) and tyrosine kinase inhibitors, have improved patient outcomes but primary/acquired resistance still pose challenges that can limit treatments' long-term efficacy. Addressing these obstacles is vital for enhancing therapeutic strategies and patient care. Alternative splicing, a post-transcriptional mechanism that enhances transcript diversity (isoforms), can produce proteins with varied functions, cellular localizations, or binding properties. Here, we comprehensively characterized the HER2 alternative splicing isoforms, assessed their expression in primary BC patients and cell lines, and explored their role in resistance to anti-HER2 therapies. We expanded the catalog of known HER2 protein-coding isoforms from 13 to 90, revealing distinct patterns of protein domains, cellular localizations, and protein structures, along with their antibody-binding sites. By profiling expression in 561 primary BC samples and mass spectrometry data, we discovered a complex landscape of HER2 isoform, revealing novel transcripts that were previously unrecognized and are not assessed in routine clinical practice. Finally, the assessment of HER2 isoform expression in BC cell cultures sensitive or resistant to trastuzumab and ADCs revealed that drug-resistant cells shifted their expression toward isoforms lacking antibody-binding domains. Our results broaden the understanding of HER2 isoforms, revealing distinct mechanisms of potential resistance to anti-HER2 therapies, particularly ADCs. This expanded landscape of HER2 isoforms emphasizes the crucial role of alternative splicing investigations in advancing precision-targeted cancer therapies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"95 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-15DOI: 10.1101/gr.280274.124
Kazuki Ichikawa, Massa J. Shoura, Karen L. Artiles, Dae-Eun Jeong, Chie Owa, Haruka Kobayashi, Yoshihiko Suzuki, Manami Kanamori, Yu Toyoshima, Yuichi Iino, Ann E. Rougvie, Lamia Wahba, Andrew Z. Fire, Erich M. Schwarz, Shinichi Morishita
{"title":"CGC1, a new reference genome for Caenorhabditis elegans","authors":"Kazuki Ichikawa, Massa J. Shoura, Karen L. Artiles, Dae-Eun Jeong, Chie Owa, Haruka Kobayashi, Yoshihiko Suzuki, Manami Kanamori, Yu Toyoshima, Yuichi Iino, Ann E. Rougvie, Lamia Wahba, Andrew Z. Fire, Erich M. Schwarz, Shinichi Morishita","doi":"10.1101/gr.280274.124","DOIUrl":"https://doi.org/10.1101/gr.280274.124","url":null,"abstract":"The original 100.3 Mb reference genome for <em>Caenorhabditis elegans</em>, generated from the wild-type laboratory strain N2, has been crucial for analysis of <em>C. elegans</em> since 1998 and has been considered complete since 2005. Unexpectedly, this long-standing reference was shown to be incomplete in 2019 by a genome assembly from the N2-derived strain VC2010. Moreover, genetically divergent versions of N2 have arisen over decades of research and hindered reproducibility of <em>C. elegans</em> genetics and genomics. Here we provide a 106.4 Mb gap-free, telomere-to-telomere genome assembly of <em>C. elegans</em>, generated from CGC1, an isogenic derivative of the N2 strain. We use improved long-read sequencing and manual assembly of 43 recalcitrant genomic regions to overcome deficiencies of prior N2 and VC2010 assemblies and to assemble tandem repeat loci, including a 772 kb sequence for the 45S rRNA genes. Although many differences from earlier assemblies come from repeat regions, unique additions to the genome are also found. Of 19,972 protein-coding genes in the N2 assembly, 19,790 (99.1%) encode products that are unchanged in the CGC1 assembly. The CGC1 assembly also may encode 183 new protein-coding and 163 new ncRNA genes. CGC1 thus provides both a completely defined reference genome and corresponding isogenic wild-type strain for <em>C. elegans</em>, allowing unique opportunities for model and systems biology.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"1 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-15DOI: 10.1101/gr.280176.124
Meilin Zhang, Heng Du, Yu Zhang, Yue Zhuo, Zhen Liu, Yahui Xue, Lei Zhou, Sixuan Zhou, Wanying Li, Jian-Feng Liu
{"title":"A high-throughput screening method for selecting feature SNPs to evaluate breed diversity and infer ancestry","authors":"Meilin Zhang, Heng Du, Yu Zhang, Yue Zhuo, Zhen Liu, Yahui Xue, Lei Zhou, Sixuan Zhou, Wanying Li, Jian-Feng Liu","doi":"10.1101/gr.280176.124","DOIUrl":"https://doi.org/10.1101/gr.280176.124","url":null,"abstract":"As the scale of deep whole-genome sequencing (WGS) data has grown exponentially, hundreds of millions of single nucleotide polymorphisms (SNPs) have been identified in livestock. Utilizing these massive SNP data in population stratification analysis, ancestry prediction, and breed diversity assessments leads to overfitting issues in computational models and creates computational bottlenecks. Therefore, selecting genetic variants that express high amounts of information for use in population diversity studies and ancestry inference becomes critically important. Here, we develop a method, HITSNP, that combines feature selection and machine learning algorithms to select high-representative SNPs that can effectively estimate breed diversity and infer ancestry. HITSNP outperforms existing feature selection methods in estimating accuracy and computational stability. Furthermore, HITSNP offers a new algorithm to predict the number and composition of ancestral populations using a small number of SNPs, and avoiding calculating the number of clusters. Taken together, HITSNP facilitates the research of population structure, animal breeding, and animal resource protection.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"109 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-14DOI: 10.1101/gr.280281.124
Daijun Zhang, Ren Qi, Xun Lan, Bin Liu
{"title":"A novel multislice framework for precision 3D spatial domain reconstruction and disease pathology analysis","authors":"Daijun Zhang, Ren Qi, Xun Lan, Bin Liu","doi":"10.1101/gr.280281.124","DOIUrl":"https://doi.org/10.1101/gr.280281.124","url":null,"abstract":"The development of spatial transcriptomics (ST) technologies has revolutionized the way we map the complex organization and functions of tissues. These technologies offer valuable insights into the organization and function of complex biological systems. However, existing methods often focus too narrowly on single modalities or resolutions, thereby hindering the comprehensive capture of multilayered biological heterogeneity. Here, STMSC is proposed as a multislice joint analysis framework featuring a precorrection mechanism that enables the precise identification of complex spatial domains, advancing disease pathology insights. STMSC assumes that precise three-dimensional (3D) reconstruction is essential for an in-depth investigation of tissue components and mechanisms. Incorporating hematoxylin and eosin (H&E) imaging data, STMSC enhances slice alignment accuracy in 3D reconstruction. By deconstructing microenvironments, it reconstructs fine-grained cellular landscapes and emphasizes collective cellular behavior in defining spatial domains. Its graph attention autoencoder with precorrection balances biological information at different levels, improving the accuracy of ST analyses. By analyzing consecutive tissue slices and pathological data sets, STMSC accurately reconstructs 3D structures and provides deeper insights into complex cancer environments. Specifically, STMSC captures intra- and interstage heterogeneity in cancer development, offering novel insights into the complexity of pathological tissue structures.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"51 1 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144622370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-07-14DOI: 10.1101/gr.279957.124
Rachel M. Petersen, Christopher M. Vockley, Amanda J. Lea
{"title":"Uncovering methylation-dependent genetic effects on regulatory element function in diverse genomes","authors":"Rachel M. Petersen, Christopher M. Vockley, Amanda J. Lea","doi":"10.1101/gr.279957.124","DOIUrl":"https://doi.org/10.1101/gr.279957.124","url":null,"abstract":"A major goal in evolutionary biology and biomedicine is to understand the complex interactions between genetic variants, the epigenome, and gene expression. However, the causal relationships between these factors remain poorly understood. mSTARR-seq, a methylation-sensitive massively parallel reporter assay, is capable of identifying methylation-dependent regulatory activity at many thousands of genomic regions simultaneously and allows for the testing of causal relationships between DNA methylation and gene expression on a region-by-region basis. Here, we develop a multiplexed mSTARR-seq protocol to assay naturally occurring human genetic variation from 25 individuals from 10 localities in Europe and Africa. We identify 6957 regulatory elements in either the unmethylated or methylated state, and this set was enriched for enhancer and promoter chromatin annotations, as expected. The expression of 58% of these regulatory elements is modulated by methylation, which is generally associated with decreased transcription. Within our set of regulatory elements, we use allele-specific expression analyses to identify 8020 sites with genetic effects on gene regulation; further, we find that 42.3% of these genetic effects vary in direction or magnitude between methylated and unmethylated states. Sites exhibiting methylation-dependent genetic effects are enriched for GWAS and EWAS annotations, implicating them in human disease. Compared with data sets that assay DNA from a single European ancestry individual, our multiplexed assay is able to uncover more genetic effects and methylation-dependent genetic effects, highlighting the importance of including diverse genomes in assays that aim to understand gene regulatory processes.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"45 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144622378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}