Genome BiologyPub Date : 2024-08-16DOI: 10.1186/s13059-024-03356-x
Siyuan Luo, Pierre-Luc Germain, Mark D. Robinson, Ferdinand von Meyenn
{"title":"Benchmarking computational methods for single-cell chromatin data analysis","authors":"Siyuan Luo, Pierre-Luc Germain, Mark D. Robinson, Ferdinand von Meyenn","doi":"10.1186/s13059-024-03356-x","DOIUrl":"https://doi.org/10.1186/s13059-024-03356-x","url":null,"abstract":"Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in individual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. We benchmark 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluate the performance of each method at different data processing stages. This comprehensive approach allows us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"135 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-16DOI: 10.1186/s13059-024-03347-y
Shobana V. Stassen, Minato Kobashi, Edmund Y. Lam, Yuanhua Huang, Joshua W. K. Ho, Kevin K. Tsia
{"title":"StaVia: spatially and temporally aware cartography with higher-order random walks for cell atlases","authors":"Shobana V. Stassen, Minato Kobashi, Edmund Y. Lam, Yuanhua Huang, Joshua W. K. Ho, Kevin K. Tsia","doi":"10.1186/s13059-024-03347-y","DOIUrl":"https://doi.org/10.1186/s13059-024-03347-y","url":null,"abstract":"Single-cell atlases pose daunting computational challenges pertaining to the integration of spatial and temporal information and the visualization of trajectories across large atlases. We introduce StaVia, a computational framework that synergizes multi-faceted single-cell data with higher-order random walks that leverage the memory of cells’ past states, fused with a cartographic Atlas View that offers intuitive graph visualization. This spatially aware cartography captures relationships between cell populations based on their spatial location as well as their gene expression and developmental stage. We demonstrate this using zebrafish gastrulation data, underscoring its potential to dissect complex biological landscapes in both spatial and temporal contexts.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"30 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-16DOI: 10.1186/s13059-024-03345-0
Kai Zhao, Hon-Cheong So, Zhixiang Lin
{"title":"scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis","authors":"Kai Zhao, Hon-Cheong So, Zhixiang Lin","doi":"10.1186/s13059-024-03345-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03345-0","url":null,"abstract":"The rapid rise in the availability and scale of scRNA-seq data needs scalable methods for integrative analysis. Though many methods for data integration have been developed, few focus on understanding the heterogeneous effects of biological conditions across different cell populations in integrative analysis. Our proposed scalable approach, scParser, models the heterogeneous effects from biological conditions, which unveils the key mechanisms by which gene expression contributes to phenotypes. Notably, the extended scParser pinpoints biological processes in cell subpopulations that contribute to disease pathogenesis. scParser achieves favorable performance in cell clustering compared to state-of-the-art methods and has a broad and diverse applicability.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"142 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-14DOI: 10.1186/s13059-024-03368-7
Nathan D. Maulding, Lucas Seninge, Joshua M. Stuart
{"title":"Associating transcription factors to single-cell trajectories with DREAMIT","authors":"Nathan D. Maulding, Lucas Seninge, Joshua M. Stuart","doi":"10.1186/s13059-024-03368-7","DOIUrl":"https://doi.org/10.1186/s13059-024-03368-7","url":null,"abstract":"Inferring gene regulatory networks from single-cell RNA-sequencing trajectories has been an active area of research yet methods are still needed to identify regulators governing cell transitions. We developed DREAMIT (Dynamic Regulation of Expression Across Modules in Inferred Trajectories) to annotate transcription-factor activity along single-cell trajectory branches, using ensembles of relations to target genes. Using a benchmark representing several different tissues, as well as external validation with ATAC-Seq and Perturb-Seq data on hematopoietic cells, the method was found to have higher tissue-specific sensitivity and specificity over competing approaches.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"40 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-14DOI: 10.1186/s13059-024-03365-w
William DeGroat, Fumitaka Inoue, Tal Ashuach, Nir Yosef, Nadav Ahituv, Anat Kreimer
{"title":"Comprehensive network modeling approaches unravel dynamic enhancer-promoter interactions across neural differentiation","authors":"William DeGroat, Fumitaka Inoue, Tal Ashuach, Nir Yosef, Nadav Ahituv, Anat Kreimer","doi":"10.1186/s13059-024-03365-w","DOIUrl":"https://doi.org/10.1186/s13059-024-03365-w","url":null,"abstract":"Increasing evidence suggests that a substantial proportion of disease-associated mutations occur in enhancers, regions of non-coding DNA essential to gene regulation. Understanding the structures and mechanisms of the regulatory programs this variation affects can shed light on the apparatuses of human diseases. We collect epigenetic and gene expression datasets from seven early time points during neural differentiation. Focusing on this model system, we construct networks of enhancer-promoter interactions, each at an individual stage of neural induction. These networks serve as the base for a rich series of analyses, through which we demonstrate their temporal dynamics and enrichment for various disease-associated variants. We apply the Girvan-Newman clustering algorithm to these networks to reveal biologically relevant substructures of regulation. Additionally, we demonstrate methods to validate predicted enhancer-promoter interactions using transcription factor overexpression and massively parallel reporter assays. Our findings suggest a generalizable framework for exploring gene regulatory programs and their dynamics across developmental processes; this includes a comprehensive approach to studying the effects of disease-associated variation on transcriptional networks. The techniques applied to our networks have been published alongside our findings as a computational tool, E-P-INAnalyzer. Our procedure can be utilized across different cellular contexts and disorders.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"11 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-13DOI: 10.1186/s13059-024-03364-x
Yi Qiu, Yoon Mo Kang, Christopher Korfmann, Fanny Pouyet, Andrew Eckford, Alexander F. Palazzo
{"title":"The GC-content at the 5′ ends of human protein-coding genes is undergoing mutational decay","authors":"Yi Qiu, Yoon Mo Kang, Christopher Korfmann, Fanny Pouyet, Andrew Eckford, Alexander F. Palazzo","doi":"10.1186/s13059-024-03364-x","DOIUrl":"https://doi.org/10.1186/s13059-024-03364-x","url":null,"abstract":"In vertebrates, most protein-coding genes have a peak of GC-content near their 5′ transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human de novo mutations. Our data suggests that GC-peaks at TSSs were present in the last common ancestor of amniotes, and likely that of vertebrates. We observe that in apes and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at the 5′ end of protein-coding gene is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at the 5′ end of protein-coding is increasing. We show that these patterns extend into the 5′ end of the open reading frame, thus impacting synonymous codon position choices. Our results indicate that the dynamics of this GC-peak in amniotes is largely shaped by historic patterns of recombination. Since decay of GC-content towards the mutation rate equilibrium is the default state for non-functional DNA, the observed decrease in GC-content at TSSs in apes and rodents indicates that the GC-peak is not being maintained by selection on most protein-coding genes in those species.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"16 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-13DOI: 10.1186/s13059-024-03359-8
Fengqi Wu, Yingxiao Mai, Chengjie Chen, Rui Xia
{"title":"SynGAP: a synteny-based toolkit for gene structure annotation polishing","authors":"Fengqi Wu, Yingxiao Mai, Chengjie Chen, Rui Xia","doi":"10.1186/s13059-024-03359-8","DOIUrl":"https://doi.org/10.1186/s13059-024-03359-8","url":null,"abstract":"Genome sequencing has become a routine task for biologists, but the challenge of gene structure annotation persists, impeding accurate genomic and genetic research. Here, we present a bioinformatics toolkit, SynGAP (Synteny-based Gene structure Annotation Polisher), which uses gene synteny information to accomplish precise and automated polishing of gene structure annotation of genomes. SynGAP offers exceptional capabilities in the improvement of gene structure annotation quality and the profiling of integrative gene synteny between species. Furthermore, an expression variation index is designed for comparative transcriptomics analysis to explore candidate genes responsible for the development of distinct traits observed in phylogenetically related species.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"47 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-12DOI: 10.1186/s13059-024-03351-2
Ian A. Mellis, Madeline E. Melzer, Nicholas Bodkin, Yogesh Goyal
{"title":"Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells","authors":"Ian A. Mellis, Madeline E. Melzer, Nicholas Bodkin, Yogesh Goyal","doi":"10.1186/s13059-024-03351-2","DOIUrl":"https://doi.org/10.1186/s13059-024-03351-2","url":null,"abstract":"Cells and tissues have a remarkable ability to adapt to genetic perturbations via a variety of molecular mechanisms. Nonsense-induced transcriptional compensation, a form of transcriptional adaptation, has recently emerged as one such mechanism, in which nonsense mutations in a gene trigger upregulation of related genes, possibly conferring robustness at cellular and organismal levels. However, beyond a handful of developmental contexts and curated sets of genes, no comprehensive genome-wide investigation of this behavior has been undertaken for mammalian cell types and conditions. How the regulatory-level effects of inherently stochastic compensatory gene networks contribute to phenotypic penetrance in single cells remains unclear. We analyze existing bulk and single-cell transcriptomic datasets to uncover the prevalence of transcriptional adaptation in mammalian systems across diverse contexts and cell types. We perform regulon gene expression analyses of transcription factor target sets in both bulk and pooled single-cell genetic perturbation datasets. Our results reveal greater robustness in expression of regulons of transcription factors exhibiting transcriptional adaptation compared to those of transcription factors that do not. Stochastic mathematical modeling of minimal compensatory gene networks qualitatively recapitulates several aspects of transcriptional adaptation, including paralog upregulation and robustness to mutation. Combined with machine learning analysis of network features of interest, our framework offers potential explanations for which regulatory steps are most important for transcriptional adaptation. Our integrative approach identifies several putative hits—genes demonstrating possible transcriptional adaptation—to follow-up on experimentally and provides a formal quantitative framework to test and refine models of transcriptional adaptation.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"5 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-12DOI: 10.1186/s13059-024-03350-3
Erkin Alaçamlı, Thijessen Naidoo, Merve N. Güler, Ekin Sağlıcan, Şevval Aktürk, Igor Mapelli, Kıvılcım Başak Vural, Mehmet Somel, Helena Malmström, Torsten Günther
{"title":"READv2: advanced and user-friendly detection of biological relatedness in archaeogenomics","authors":"Erkin Alaçamlı, Thijessen Naidoo, Merve N. Güler, Ekin Sağlıcan, Şevval Aktürk, Igor Mapelli, Kıvılcım Başak Vural, Mehmet Somel, Helena Malmström, Torsten Günther","doi":"10.1186/s13059-024-03350-3","DOIUrl":"https://doi.org/10.1186/s13059-024-03350-3","url":null,"abstract":"The advent of genome-wide ancient DNA analysis has revolutionized our understanding of prehistoric societies. However, studying biological relatedness in these groups requires tailored approaches due to the challenges of analyzing ancient DNA. READv2, an optimized Python3 implementation of the most widely used tool for this purpose, addresses these challenges while surpassing its predecessor in speed and accuracy. For sufficient amounts of data, it can classify up to third-degree relatedness and differentiate between the two types of first-degree relatedness, full siblings and parent-offspring. READv2 enables user-friendly, efficient, and nuanced analysis of biological relatedness, facilitating a deeper understanding of past social structures.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"84 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-08-09DOI: 10.1186/s13059-024-03343-2
Pelin Icer Baykal, Paweł Piotr Łabaj, Florian Markowetz, Lynn M. Schriml, Daniel J. Stekhoven, Serghei Mangul, Niko Beerenwinkel
{"title":"Genomic reproducibility in the bioinformatics era","authors":"Pelin Icer Baykal, Paweł Piotr Łabaj, Florian Markowetz, Lynn M. Schriml, Daniel J. Stekhoven, Serghei Mangul, Niko Beerenwinkel","doi":"10.1186/s13059-024-03343-2","DOIUrl":"https://doi.org/10.1186/s13059-024-03343-2","url":null,"abstract":"In biomedical research, validating a scientific discovery hinges on the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility remain imprecise. We argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent results across technical replicates, is essential for advancing scientific knowledge and medical applications. Initially, we examine different interpretations of reproducibility in genomics to clarify terms. Subsequently, we discuss the impact of bioinformatics tools on genomic reproducibility and explore methods for evaluating these tools regarding their effectiveness in ensuring genomic reproducibility. Finally, we recommend best practices to improve genomic reproducibility.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"29 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}