Genome BiologyPub Date : 2025-03-11DOI: 10.1186/s13059-025-03495-9
Thatchayut Unjitwattana, Qianhui Huang, Yiwen Yang, Leyang Tao, Youqi Yang, Mengtian Zhou, Yuheng Du, Lana X. Garmire
{"title":"Single-cell RNA-seq data have prevalent blood contamination but can be rescued by Originator, a computational tool separating single-cell RNA-seq by genetic and contextual information","authors":"Thatchayut Unjitwattana, Qianhui Huang, Yiwen Yang, Leyang Tao, Youqi Yang, Mengtian Zhou, Yuheng Du, Lana X. Garmire","doi":"10.1186/s13059-025-03495-9","DOIUrl":"https://doi.org/10.1186/s13059-025-03495-9","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) data from complex human tissues have prevalent blood cell contamination during the sample preparation process. They may also comprise cells of different genetic makeups. We propose a new computational framework, Originator, which deciphers single cells by genetic origin and separates immune cells of blood contamination from those of expected tissue-resident cells. We demonstrate the accuracy of Originator at separating immune cells from the blood and tissue as well as cells of different genetic origins, using a variety of artificially mixed and real datasets, including pancreatic cancer and placentas as examples.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"40 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143589834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recruitment and rejoining of remote double-strand DNA breaks for enhanced and precise chromosome editing","authors":"Mingyao Wang, Pengchong Fu, Ziheng Chen, Xiangnan Wang, Hanhui Ma, Xuedi Zhang, Guanjun Gao","doi":"10.1186/s13059-025-03523-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03523-8","url":null,"abstract":"Chromosomal rearrangements, such as translocations, deletions, and inversions, underlie numerous genetic diseases and cancers, yet precise engineering of these rearrangements remains challenging. Here, we present a CRISPR-based homologous recombination-mediated rearrangement (HRMR) strategy that leverages homologous donor templates to align and repair broken chromosome ends. HRMR improves efficiency by approximately 80-fold compared to non-homologous end joining, achieving over 95% homologous recombination. Validated across multiple loci and cell lines, HRMR enables efficient and accurate chromosomal rearrangements. Live-cell imaging reveals that homologous donors mediate chromosome end proximity, enhancing rearrangement efficiency. Thus, HRMR provides a powerful tool for disease modeling, chromosomal biology, and therapeutic applications.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"14 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143589832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-03-10DOI: 10.1186/s13059-025-03518-5
Robert Chen, Ben Omega Petrazzini, Áine Duffy, Ghislain Rocheleau, Daniel Jordan, Meena Bansal, Ron Do
{"title":"Trans-ancestral rare variant association study with machine learning-based phenotyping for metabolic dysfunction-associated steatotic liver disease","authors":"Robert Chen, Ben Omega Petrazzini, Áine Duffy, Ghislain Rocheleau, Daniel Jordan, Meena Bansal, Ron Do","doi":"10.1186/s13059-025-03518-5","DOIUrl":"https://doi.org/10.1186/s13059-025-03518-5","url":null,"abstract":"Genome-wide association studies (GWAS) have identified common variants associated with metabolic dysfunction-associated steatotic liver disease (MASLD). However, rare coding variant studies have been limited by phenotyping challenges and small sample sizes. We test associations of rare and ultra-rare coding variants with proton density fat fraction (PDFF) and MASLD case–control status in 736,010 participants of diverse ancestries from the UK Biobank, All of Us, and BioMe and performed a trans-ancestral meta-analysis. We then developed models to accurately predict PDFF and MASLD status in the UK Biobank and tested associations with these predicted phenotypes to increase statistical power. The trans-ancestral meta-analysis with PDFF and MASLD case–control status identifies two single variants and two gene-level associations in APOB, CDH5, MYCBP2, and XAB2. Association testing with predicted phenotypes, which replicates more known genetic variants from GWAS than true phenotypes, identifies 16 single variants and 11 gene-level associations implicating 23 additional genes. Two variants were polymorphic only among African ancestry participants and several associations showed significant heterogeneity in ancestry and sex-stratified analyses. In total, we identified 27 genes, of which 3 are monogenic causes of steatosis (APOB, G6PC1, PPARG), 4 were previously associated with MASLD (APOB, APOC3, INSR, PPARG), and 23 had supporting clinical, experimental, and/or genetic evidence. Our results suggest that trans-ancestral association analyses can identify ancestry-specific rare and ultra-rare coding variants in MASLD pathogenesis. Furthermore, we demonstrate the utility of machine learning in genetic investigations of difficult-to-phenotype diseases in trans-ancestral biobanks.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"192 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143582920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise engineering of gene expression by editing plasticity","authors":"Yang Qiu, Lifen Liu, Jiali Yan, Xianglei Xiang, Shouzhe Wang, Yun Luo, Kaixuan Deng, Jieting Xu, Minliang Jin, Xiaoyu Wu, Liwei Cheng, Ying Zhou, Weibo Xie, Hai-Jun Liu, Alisdair R. Fernie, Xuehai Hu, Jianbing Yan","doi":"10.1186/s13059-025-03516-7","DOIUrl":"https://doi.org/10.1186/s13059-025-03516-7","url":null,"abstract":"Identifying transcriptional cis-regulatory elements (CREs) and understanding their role in gene expression are essential for the precise manipulation of gene expression and associated phenotypes. This knowledge is fundamental for advancing genetic engineering and improving crop traits. We here demonstrate that CREs can be accurately predicted and utilized to precisely regulate gene expression beyond the range of natural variation. We firstly build two sequence-to-expression deep learning models to respectively identify distal and proximal CREs by combining them with interpretability methods in multiple crops. A large number of distal CREs are verified for enhancer activity in vitro using UMI-STARR-seq on 12,000 synthesized sequences. These comprehensively characterized CREs and their precisely predicted effects further contribute to the design of in silico editing schemes for precise engineering of gene expression. We introduce a novel concept of “editingplasticity” to evaluate the potential of promoter editing to alter expression of each gene. As a proof of concept, both exhaustive prediction and random knockout mutants are analyzed within the promoter region of ZmVTE4, a key gene affecting α-tocopherol content in maize. A high degree of agreement between predicted and observed expression is observed, extending the range of natural variation and thereby allowing the creation of an optimal phenotype. Our study provides a robust computational framework that advances knowledge-guided gene editing for precise regulation of gene expression and crop improvement. By reliably predicting and validating CREs, we offer a tool for targeted genetic modifications, enhancing desirable traits in crops.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"33 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143582919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-03-07DOI: 10.1186/s13059-025-03511-y
Antoine Passemiers, Stefania Tuveri, Tatjana Jatsenko, Adriaan Vanderstichele, Pieter Busschaert, An Coosemans, Dirk Timmerman, Sabine Tejpar, Peter Vandenberghe, Diether Lambrechts, Daniele Raimondi, Joris Robert Vermeesch, Yves Moreau
{"title":"DAGIP: alleviating cell-free DNA sequencing biases with optimal transport","authors":"Antoine Passemiers, Stefania Tuveri, Tatjana Jatsenko, Adriaan Vanderstichele, Pieter Busschaert, An Coosemans, Dirk Timmerman, Sabine Tejpar, Peter Vandenberghe, Diether Lambrechts, Daniele Raimondi, Joris Robert Vermeesch, Yves Moreau","doi":"10.1186/s13059-025-03511-y","DOIUrl":"https://doi.org/10.1186/s13059-025-03511-y","url":null,"abstract":"Cell-free DNA (cfDNA) is a rich source of biomarkers for various pathophysiological conditions. Preanalytical variables, such as the library preparation protocol or sequencing platform, are major confounders of cfDNA analysis. We present DAGIP, a novel data correction method that builds on optimal transport theory and deep learning, which explicitly corrects for the effect of such preanalytical variables and can infer technical biases. Our method improves cancer detection and copy number alteration analysis by alleviating the sources of variation that are not of biological origin. It also enhances fragmentomic analysis of cfDNA. DAGIP allows the integration of cohorts from different studies.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"17 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-03-06DOI: 10.1186/s13059-025-03512-x
Pol Vendrell-Mir, Basile Leduque, Leandro Quadrana
{"title":"Ultra-sensitive detection of transposon insertions across multiple families by transposable element display sequencing","authors":"Pol Vendrell-Mir, Basile Leduque, Leandro Quadrana","doi":"10.1186/s13059-025-03512-x","DOIUrl":"https://doi.org/10.1186/s13059-025-03512-x","url":null,"abstract":"Mobilization of transposable elements (TEs) can generate large effect mutations. However, due to the difficulty of detecting new TE insertions in genomes and the typically rare occurrence of transposition, the actual rate, distribution, and population dynamics of new insertions remain largely unexplored. We present a TE display sequencing approach that leverages target amplification of TE extremities to detect non-reference TE insertions with high specificity and sensitivity, enabling the detection of insertions at frequencies as low as 1 in 250,000 within a DNA sample. Moreover, this method allows the simultaneous detection of insertions for distinct TE families, including both retrotransposons and DNA transposons, enhancing its versatility and cost-effectiveness for investigating complex “mobilomes.” When combined with nanopore sequencing, this approach enables the identification of insertions using long-read information and achieves a turnaround time from DNA extraction to insertion identification of less than 24 h, significantly reducing the time-to-answer. By analyzing a population of Arabidopsis thaliana plants undergoing a transposition burst, we demonstrate the power of the multiplex TE display sequencing to analyze “evolve and resequence” experiments. Notably, we find that 3–4% of de novo TE insertions exhibit recurrent allele frequency changes indicative of either positive or negative selection. TE display sequencing is an ultra-sensitive, specific, simple, and cost-effective approach for investigating the rate and landscape of new TE insertions across multiple families in large-scale population experiments. We provide a step-by-step experimental protocol and ready-to-use bioinformatic pipelines to facilitate its straightforward implementation.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"15 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-03-06DOI: 10.1186/s13059-025-03517-6
Qian Liu, Yang Liu, Congyang Yi, Zhi Gao, Zeyan Zhang, Congle Zhu, James A. Birchler, Fangpu Han
{"title":"Genome assembly of the maize B chromosome provides insight into its epigenetic characteristics and effects on the host genome","authors":"Qian Liu, Yang Liu, Congyang Yi, Zhi Gao, Zeyan Zhang, Congle Zhu, James A. Birchler, Fangpu Han","doi":"10.1186/s13059-025-03517-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03517-6","url":null,"abstract":"B chromosomes contribute to the genetic variation in numerous eukaryotes. Yet their genetic and epigenetic characteristics, as well as their effects on the host genome remain poorly understood. Here, we present a comprehensive genome assembly of diploid maize B73 with two copies of B chromosomes using long-read sequencing. We annotate a total of 1124 high-confidence protein-coding genes and 119,579,190 bp repeat elements representing 88.55% of the B chromosome assembly. Using CENH3 ChIP-seq data, we accurately determined the position of the B chromosome centromere, which features a unique monomer-composed satellite array distinct from that found on the chromosome arms. Our research provides detailed genetic and epigenetic maps of the B chromosome, shedding light on its molecular landscape, including DNA sequence composition, DNA methylation patterns, histone modifications, and R-loop distributions across various chromatin regions. Consistent with the cytological morphology of the B chromosome, the less condensed euchromatin regions displayed high levels of H3K4me3, H3K9ac, gene expression, and dense R-loop distributions. DNA methylation on the B chromosome was primarily observed at CG sites. The centromeric region is notably enriched with H3K4me3 and H3K9ac histone modifications and has lower CHG methylation compared to the pericentromeric regions. Moreover, our findings reveal that B chromosome accumulation affects R-loop formation on A chromosomes, and exerts tissue-specific influences on A chromosome gene expression. The accurate assembly and detailed epigenetic maps of the maize B chromosome will help understand the drive mechanism, reveal its conflict with the host genome, and accelerate the construction of artificial chromosomes.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"50 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-03-04DOI: 10.1186/s13059-025-03514-9
Chen-Yang Li, Yong-Jia Hong, Bo Li, Xiao-Fei Zhang
{"title":"Benchmarking single-cell cross-omics imputation methods for surface protein expression","authors":"Chen-Yang Li, Yong-Jia Hong, Bo Li, Xiao-Fei Zhang","doi":"10.1186/s13059-025-03514-9","DOIUrl":"https://doi.org/10.1186/s13059-025-03514-9","url":null,"abstract":"Recent advances in single-cell multimodal omics sequencing have facilitated the simultaneous profiling of transcriptomes and surface proteomes within individual cells, offering insights into cellular functions and heterogeneity. However, the high costs and technical complexity of protocols like CITE-seq and REAP-seq constrain large-scale dataset generation. To overcome this limitation, surface protein data imputation methods have emerged to predict protein abundances from scRNA-seq data. We present a comprehensive benchmark of twelve state-of-the-art imputation methods across eleven datasets and six scenarios. Our analysis evaluates the methods’ accuracy, sensitivity to training data size, robustness across experiments, and usability in terms of running time, memory usage, popularity, and user-friendliness. With benchmark experiments in diverse scenarios and a comprehensive evaluation framework of the results, our study offers valuable insights into the performance and applicability of surface protein data imputation methods in single-cell omics research. Based on our results, Seurat v4 (PCA) and Seurat v3 (PCA) demonstrate exceptional performance, offering promising avenues for further research in single-cell omics.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"29 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143538297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-03-03DOI: 10.1186/s13059-025-03507-8
Josep Biayna, Gabrijela Dumbović
{"title":"Decoding subcellular RNA localization one molecule at a time","authors":"Josep Biayna, Gabrijela Dumbović","doi":"10.1186/s13059-025-03507-8","DOIUrl":"https://doi.org/10.1186/s13059-025-03507-8","url":null,"abstract":"Eukaryotic cells are highly structured and composed of multiple membrane-bound and membraneless organelles. Subcellular RNA localization is a critical regulator of RNA function, influencing various biological processes. At any given moment, RNAs must accurately navigate the three-dimensional subcellular environment to ensure proper localization and function, governed by numerous factors, including splicing, RNA stability, modifications, and localizing sequences. Aberrant RNA localization can contribute to the development of numerous diseases. Here, we explore diverse RNA localization mechanisms and summarize advancements in methods for determining subcellular RNA localization, highlighting imaging techniques transforming our ability to study RNA dynamics at the single-molecule level.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"17 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143532504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2025-02-28DOI: 10.1186/s13059-025-03509-6
Nicolas Antonio da Silva, Onur Özer, Magdalena Haller-Caskie, Yan-Rong Chen, Daniel Kolbe, Sabine Schade-Lindig, Joachim Wahl, Carola Berszin, Michael Francken, Irina Görner, Kerstin Schierhold, Joachim Pechtl, Gisela Grupe, Christoph Rinne, Johannes Müller, Tobias L. Lenz, Almut Nebel, Ben Krause-Kyora
{"title":"Admixture as a source for HLA variation in Neolithic European farming communities","authors":"Nicolas Antonio da Silva, Onur Özer, Magdalena Haller-Caskie, Yan-Rong Chen, Daniel Kolbe, Sabine Schade-Lindig, Joachim Wahl, Carola Berszin, Michael Francken, Irina Görner, Kerstin Schierhold, Joachim Pechtl, Gisela Grupe, Christoph Rinne, Johannes Müller, Tobias L. Lenz, Almut Nebel, Ben Krause-Kyora","doi":"10.1186/s13059-025-03509-6","DOIUrl":"https://doi.org/10.1186/s13059-025-03509-6","url":null,"abstract":"The northern European Neolithic is characterized by two major demographic events: immigration of early farmers from Anatolia at 7500 years before present, and their admixture with local western hunter-gatherers forming late farmers, from around 6200 years before present. The influence of this admixture event on variation in the immune-relevant human leukocyte antigen (HLA) region is understudied. We analyzed genome-wide data of 125 individuals from seven archeological early farmer and late farmer sites located in present-day Germany. The late farmer group studied here is associated with the Wartberg culture, from around 5500–4800 years before present. We note that late farmers resulted from sex-biased admixture from male western hunter-gatherers. In addition, we observe Y-chromosome haplogroup I as the dominant lineage in late farmers, with site-specific sub-lineages. We analyze true HLA genotypes from 135 Neolithic individuals, the majority of which were produced in this study. We observe significant shifts in HLA allele frequencies from early farmers to late farmers, likely due to admixture with western hunter-gatherers. Especially for the haplotype DQB1*04:01-DRB1*08:01, there is evidence for a western hunter-gatherer origin. The HLA diversity increased from early farmers to late farmers. However, it is considerably lower than in modern populations. Both early farmers and late farmers exhibit a relatively narrow HLA allele spectrum compared to today. This coincides with sparse traces of pathogen DNA, potentially indicating a lower pathogen pressure at the time.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143518778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}