{"title":"Benchmarking clustering, alignment, and integration methods for spatial transcriptomics","authors":"Yunfei Hu, Manfei Xie, Yikang Li, Mingxing Rao, Wenjun Shen, Can Luo, Haoran Qin, Jihoon Baek, Xin Maizie Zhou","doi":"10.1186/s13059-024-03361-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03361-0","url":null,"abstract":"Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"33 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Creating large-scale genetic diversity in Arabidopsis via base editing-mediated deep artificial evolution","authors":"Xiang Wang, Wenbo Pan, Chao Sun, Hong Yang, Zhentao Cheng, Fei Yan, Guojing Ma, Yun Shang, Rui Zhang, Caixia Gao, Lijing Liu, Huawei Zhang","doi":"10.1186/s13059-024-03358-9","DOIUrl":"https://doi.org/10.1186/s13059-024-03358-9","url":null,"abstract":"Base editing is a powerful tool for artificial evolution to create allelic diversity and improve agronomic traits. However, the great evolutionary potential for every sgRNA target has been overlooked. And there is currently no high-throughput method for generating and characterizing as many changes in a single target as possible based on large mutant pools to permit rapid gene directed evolution in plants. In this study, we establish an efficient germline-specific evolution system to screen beneficial alleles in Arabidopsis which could be applied for crop improvement. This system is based on a strong egg cell-specific cytosine base editor and the large seed production of Arabidopsis, which enables each T1 plant with unedited wild type alleles to produce thousands of independent T2 mutant lines. It has the ability of creating a wide range of mutant lines, including those containing atypical base substitutions, and as well providing a space- and labor-saving way to store and screen the resulting mutant libraries. Using this system, we efficiently generate herbicide-resistant EPSPS, ALS, and HPPD variants that could be used in crop breeding. Here, we demonstrate the significant potential of base editing-mediated artificial evolution for each sgRNA target and devised an efficient system for conducting deep evolution to harness this potential.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"367 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-09DOI: 10.1186/s13059-024-03360-1
Ilke Demirci, Anton J. M. Larsson, Xinsong Chen, Johan Hartman, Rickard Sandberg, Jonas Frisén
{"title":"Inferring clonal somatic mutations directed by X chromosome inactivation status in single cells","authors":"Ilke Demirci, Anton J. M. Larsson, Xinsong Chen, Johan Hartman, Rickard Sandberg, Jonas Frisén","doi":"10.1186/s13059-024-03360-1","DOIUrl":"https://doi.org/10.1186/s13059-024-03360-1","url":null,"abstract":"Analysis of clonal dynamics in human tissues is enabled by somatic genetic variation. Here, we show that analysis of mitochondrial mutations in single cells is dramatically improved in females when using X chromosome inactivation to select informative clonal mutations. Applying this strategy to human peripheral mononuclear blood cells reveals clonal structures within T cells that otherwise are blurred by non-informative mutations, including the separation of gamma-delta T cells, suggesting this approach can be used to decipher clonal dynamics of cells in human tissues.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-08DOI: 10.1186/s13059-024-03354-z
Andrea Cipriano, Alessio Colantoni, Alessandro Calicchio, Jonathan Fiorentino, Danielle Gomes, Mahdi Moqri, Alexander Parker, Sajede Rasouli, Matthew Caldwell, Francesca Briganti, Maria Grazia Roncarolo, Antonio Baldini, Katja G. Weinacht, Gian Gaetano Tartaglia, Vittorio Sebastiano
{"title":"Transcriptional and epigenetic characterization of a new in vitro platform to model the formation of human pharyngeal endoderm","authors":"Andrea Cipriano, Alessio Colantoni, Alessandro Calicchio, Jonathan Fiorentino, Danielle Gomes, Mahdi Moqri, Alexander Parker, Sajede Rasouli, Matthew Caldwell, Francesca Briganti, Maria Grazia Roncarolo, Antonio Baldini, Katja G. Weinacht, Gian Gaetano Tartaglia, Vittorio Sebastiano","doi":"10.1186/s13059-024-03354-z","DOIUrl":"https://doi.org/10.1186/s13059-024-03354-z","url":null,"abstract":"The Pharyngeal Endoderm (PE) is an extremely relevant developmental tissue, serving as the progenitor for the esophagus, parathyroids, thyroids, lungs, and thymus. While several studies have highlighted the importance of PE cells, a detailed transcriptional and epigenetic characterization of this important developmental stage is still missing, especially in humans, due to technical and ethical constraints pertaining to its early formation. Here we fill this knowledge gap by developing an in vitro protocol for the derivation of PE-like cells from human Embryonic Stem Cells (hESCs) and by providing an integrated multi-omics characterization. Our PE-like cells robustly express PE markers and are transcriptionally homogenous and similar to in vivo mouse PE cells. In addition, we define their epigenetic landscape and dynamic changes in response to Retinoic Acid by combining ATAC-Seq and ChIP-Seq of histone modifications. The integration of multiple high-throughput datasets leads to the identification of new putative regulatory regions and to the inference of a Retinoic Acid-centered transcription factor network orchestrating the development of PE-like cells. By combining hESCs differentiation with computational genomics, our work reveals the epigenetic dynamics that occur during human PE differentiation, providing a solid resource and foundation for research focused on the development of PE derivatives and the modeling of their developmental defects in genetic syndromes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"52 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141904320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-06DOI: 10.1186/s13059-024-03340-5
Vincent Jonchère, Hugo Montémont, Enora Le Scanf, Aurélie Siret, Quentin Letourneur, Emmanuel Tubacher, Christophe Battail, Assane Fall, Karim Labreche, Victor Renault, Toky Ratovomanana, Olivier Buhard, Ariane Jolly, Philippe Le Rouzic, Cody Feys, Emmanuelle Despras, Habib Zouali, Rémy Nicolle, Pascale Cervera, Magali Svrcek, Pierre Bourgoin, Hélène Blanché, Anne Boland, Jérémie Lefèvre, Yann Parc, Mehdi Touat, Franck Bielle, Danielle Arzur, Gwennina Cueff, Catherine Le Jossic-Corcos, Gaël Quéré, Gwendal Dujardin, Marc Blondel, Cédric Le Maréchal, Romain Cohen, Thierry André, Florence Coulet, Pierre de la Grange, Aurélien de Reyniès, Jean-François Fléjou, Florence Renaud, Agusti Alentorn, Laurent Corcos, Jean-François Deleuze, Ada Collura, Alex Duval
{"title":"Microsatellite instability at U2AF-binding polypyrimidic tract sites perturbs alternative splicing during colorectal cancer initiation","authors":"Vincent Jonchère, Hugo Montémont, Enora Le Scanf, Aurélie Siret, Quentin Letourneur, Emmanuel Tubacher, Christophe Battail, Assane Fall, Karim Labreche, Victor Renault, Toky Ratovomanana, Olivier Buhard, Ariane Jolly, Philippe Le Rouzic, Cody Feys, Emmanuelle Despras, Habib Zouali, Rémy Nicolle, Pascale Cervera, Magali Svrcek, Pierre Bourgoin, Hélène Blanché, Anne Boland, Jérémie Lefèvre, Yann Parc, Mehdi Touat, Franck Bielle, Danielle Arzur, Gwennina Cueff, Catherine Le Jossic-Corcos, Gaël Quéré, Gwendal Dujardin, Marc Blondel, Cédric Le Maréchal, Romain Cohen, Thierry André, Florence Coulet, Pierre de la Grange, Aurélien de Reyniès, Jean-François Fléjou, Florence Renaud, Agusti Alentorn, Laurent Corcos, Jean-François Deleuze, Ada Collura, Alex Duval","doi":"10.1186/s13059-024-03340-5","DOIUrl":"https://doi.org/10.1186/s13059-024-03340-5","url":null,"abstract":"Microsatellite instability (MSI) due to mismatch repair deficiency (dMMR) is common in colorectal cancer (CRC). These cancers are associated with somatic coding events, but the noncoding pathophysiological impact of this genomic instability is yet poorly understood. Here, we perform an analysis of coding and noncoding MSI events at the different steps of colorectal tumorigenesis using whole exome sequencing and search for associated splicing events via RNA sequencing at the bulk-tumor and single-cell levels. Our results demonstrate that MSI leads to hundreds of noncoding DNA mutations, notably at polypyrimidine U2AF RNA-binding sites which are endowed with cis-activity in splicing, while higher frequency of exon skipping events are observed in the mRNAs of MSI compared to non-MSI CRC. At the DNA level, these noncoding MSI mutations occur very early prior to cell transformation in the dMMR colonic crypt, accounting for only a fraction of the exon skipping in MSI CRC. At the RNA level, the aberrant exon skipping signature is likely to impair colonic cell differentiation in MSI CRC affecting the expression of alternative exons encoding protein isoforms governing cell fate, while also targeting constitutive exons, making dMMR cells immunogenic in early stage before the onset of coding mutations. This signature is characterized by its similarity to the oncogenic U2AF1-S34F splicing mutation observed in several other non-MSI cancer. Overall, these findings provide evidence that a very early RNA splicing signature partly driven by MSI impairs cell differentiation and promotes MSI CRC initiation, far before coding mutations which accumulate later during MSI tumorigenesis.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"38 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-06DOI: 10.1186/s13059-024-03352-1
Simon C. Biddie, Giovanna Weykopf, Elizabeth F. Hird, Elias T. Friman, Wendy A. Bickmore
{"title":"DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants","authors":"Simon C. Biddie, Giovanna Weykopf, Elizabeth F. Hird, Elias T. Friman, Wendy A. Bickmore","doi":"10.1186/s13059-024-03352-1","DOIUrl":"https://doi.org/10.1186/s13059-024-03352-1","url":null,"abstract":"Genome-wide association studies (GWAS) have revealed a multitude of candidate genetic variants affecting the risk of developing complex traits and diseases. However, the highlighted regions are typically in the non-coding genome, and uncovering the functional causative single nucleotide variants (SNVs) is challenging. Prioritization of variants is commonly based on genomic annotation with markers of active regulatory elements, but current approaches still poorly predict functional variants. To address this, we systematically analyze six markers of active regulatory elements for their ability to identify functional variants. We benchmark against molecular quantitative trait loci (molQTL) from assays of regulatory element activity that identify allelic effects on DNA-binding factor occupancy, reporter assay expression, and chromatin accessibility. We identify the combination of DNase footprints and divergent enhancer RNA (eRNA) as markers for functional variants. This signature provides high precision, but with a trade-off of low recall, thus substantially reducing candidate variant sets to prioritize variants for functional validation. We present this as a framework called FINDER—Functional SNV IdeNtification using DNase footprints and eRNA. We demonstrate the utility to prioritize variants using leukocyte count trait and analyze variants in linkage disequilibrium with a lead variant to predict a functional variant in asthma. Our findings have implications for prioritizing variants from GWAS, in development of predictive scoring algorithms, and for functionally informed fine mapping approaches.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"44 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-06DOI: 10.1186/s13059-024-03362-z
Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen, Tam Thi Nguyen, Canh Hao Nguyen, Huong Thanh Phung, Tho Huu Ho, Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao
{"title":"Efficient inference of large prokaryotic pangenomes with PanTA","authors":"Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen, Tam Thi Nguyen, Canh Hao Nguyen, Huong Thanh Phung, Tho Huu Ho, Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao","doi":"10.1186/s13059-024-03362-z","DOIUrl":"https://doi.org/10.1186/s13059-024-03362-z","url":null,"abstract":"Pangenome inference is an indispensable step in bacterial genomics, yet its scalability poses a challenge due to the rapid growth of genomic collections. This paper presents PanTA, a software package designed for constructing pangenomes of large bacterial datasets, showing unprecedented efficiency levels multiple times higher than existing tools. PanTA introduces a novel mechanism to construct the pangenome progressively without rebuilding the accumulated collection from scratch. The progressive mode is shown to consume orders of magnitude less computational resources than existing solutions in managing growing datasets. The software is open source and is publicly available at https://github.com/amromics/panta and at 10.6084/m9.figshare.23724705 .","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"1 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-05DOI: 10.1186/s13059-024-03353-0
Yawei Li, Yuan Luo
{"title":"STdGCN: spatial transcriptomic cell-type deconvolution using graph convolutional networks","authors":"Yawei Li, Yuan Luo","doi":"10.1186/s13059-024-03353-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03353-0","url":null,"abstract":"Spatially resolved transcriptomics integrates high-throughput transcriptome measurements with preserved spatial cellular organization information. However, many technologies cannot reach single-cell resolution. We present STdGCN, a graph model leveraging single-cell RNA sequencing (scRNA-seq) as reference for cell-type deconvolution in spatial transcriptomic (ST) data. STdGCN incorporates expression profiles from scRNA-seq and spatial localization from ST data for deconvolution. Extensive benchmarking on multiple datasets demonstrates that STdGCN outperforms 17 state-of-the-art models. In a human breast cancer Visium dataset, STdGCN delineates stroma, lymphocytes, and cancer cells, aiding tumor microenvironment analysis. In human heart ST data, STdGCN identifies changes in endothelial-cardiomyocyte communications during tissue development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"18 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scPriorGraph: constructing biosemantic cell–cell graphs with prior gene set selection for cell type identification from scRNA-seq data","authors":"Xiyue Cao, Yu-An Huang, Zhu-Hong You, Xuequn Shang, Lun Hu, Peng-Wei Hu, Zhi-An Huang","doi":"10.1186/s13059-024-03357-w","DOIUrl":"https://doi.org/10.1186/s13059-024-03357-w","url":null,"abstract":"Cell type identification is an indispensable analytical step in single-cell data analyses. To address the high noise stemming from gene expression data, existing computational methods often overlook the biologically meaningful relationships between genes, opting to reduce all genes to a unified data space. We assume that such relationships can aid in characterizing cell type features and improving cell type recognition accuracy. To this end, we introduce scPriorGraph, a dual-channel graph neural network that integrates multi-level gene biosemantics. Experimental results demonstrate that scPriorGraph effectively aggregates feature values of similar cells using high-quality graphs, achieving state-of-the-art performance in cell type identification.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"33 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-01DOI: 10.1186/s13059-024-03335-2
Pooja Kathail, Richard W. Shuai, Ryan Chung, Chun Jimmie Ye, Gabriel B. Loeb, Nilah M. Ioannidis
{"title":"Current genomic deep learning models display decreased performance in cell type-specific accessible regions","authors":"Pooja Kathail, Richard W. Shuai, Ryan Chung, Chun Jimmie Ye, Gabriel B. Loeb, Nilah M. Ioannidis","doi":"10.1186/s13059-024-03335-2","DOIUrl":"https://doi.org/10.1186/s13059-024-03335-2","url":null,"abstract":"A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability. We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models―Enformer and Sei―varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax―through single-task learning or high capacity multi-task models―can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"37 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}