Genome BiologyPub Date : 2024-09-16DOI: 10.1186/s13059-024-03386-5
Sarah M. Goggin, Eli R. Zunder
{"title":"ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets","authors":"Sarah M. Goggin, Eli R. Zunder","doi":"10.1186/s13059-024-03386-5","DOIUrl":"https://doi.org/10.1186/s13059-024-03386-5","url":null,"abstract":"Clustering is widely used for single-cell analysis, but current methods are limited in accuracy, robustness, ease of use, and interpretability. To address these limitations, we developed an ensemble clustering method that outperforms other methods at hard clustering without the need for hyperparameter tuning. It also performs soft clustering to characterize continuum-like regions and quantify clustering uncertainty, demonstrated here by mapping the connectivity and intermediate transitions between MNIST handwritten digits and between hypothalamic tanycyte subpopulations. This hyperparameter-randomized ensemble approach improves the accuracy, robustness, ease of use, and interpretability of single-cell clustering, and may prove useful in other fields as well.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-09-16DOI: 10.1186/s13059-024-03379-4
Kuan-Hao Chao, Alan Mao, Steven L. Salzberg, Mihaela Pertea
{"title":"Splam: a deep-learning-based splice site predictor that improves spliced alignments","authors":"Kuan-Hao Chao, Alan Mao, Steven L. Salzberg, Mihaela Pertea","doi":"10.1186/s13059-024-03379-4","DOIUrl":"https://doi.org/10.1186/s13059-024-03379-4","url":null,"abstract":"The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. We describe Splam, a novel method for predicting splice junctions in DNA using deep residual convolutional neural networks. Unlike previous models, Splam looks at a 400-base-pair window flanking each splice site, reflecting the biological splicing process that relies primarily on signals within this window. Splam also trains on donor and acceptor pairs together, mirroring how the splicing machinery recognizes both ends of each intron. Compared to SpliceAI, Splam is consistently more accurate, achieving 96% accuracy in predicting human splice junctions.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-09-16DOI: 10.1186/s13059-024-03388-3
Yueqi Tao, Wenfei Xian, Zhigui Bao, Fernando A. Rabanal, Andrea Movilli, Christa Lanz, Gautam Shirsekar, Detlef Weigel
{"title":"Atlas of telomeric repeat diversity in Arabidopsis thaliana","authors":"Yueqi Tao, Wenfei Xian, Zhigui Bao, Fernando A. Rabanal, Andrea Movilli, Christa Lanz, Gautam Shirsekar, Detlef Weigel","doi":"10.1186/s13059-024-03388-3","DOIUrl":"https://doi.org/10.1186/s13059-024-03388-3","url":null,"abstract":"Telomeric repeat arrays at the ends of chromosomes are highly dynamic in composition, but their repetitive nature and technological limitations have made it difficult to assess their true variation in genome diversity surveys. We have comprehensively characterized the sequence variation immediately adjacent to the canonical telomeric repeat arrays at the very ends of chromosomes in 74 genetically diverse Arabidopsis thaliana accessions. We first describe several types of distinct telomeric repeat units and then identify evolutionary processes such as local homogenization and higher-order repeat formation that shape diversity of chromosome ends. By comparing largely isogenic samples, we also determine repeat number variation of the degenerate and variant telomeric repeat array at both the germline and somatic levels. Finally, our analysis of haplotype structure uncovers chromosome end-specific patterns in the distribution of variant telomeric repeats, and their linkage to the more proximal non-coding region. Our findings illustrate the spectrum of telomeric repeat variation at multiple levels in A. thaliana—in germline and soma, across all chromosome ends, and across genetic groups—thereby expanding our knowledge of the evolution of chromosome ends.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dimension reduction, cell clustering, and cell–cell communication inference for single-cell transcriptomics with DcjComm","authors":"Qian Ding, Wenyi Yang, Guangfu Xue, Hongxin Liu, Yideng Cai, Jinhao Que, Xiyun Jin, Meng Luo, Fenglan Pang, Yuexin Yang, Yi Lin, Yusong Liu, Haoxiu Sun, Renjie Tan, Pingping Wang, Zhaochun Xu, Qinghua Jiang","doi":"10.1186/s13059-024-03385-6","DOIUrl":"https://doi.org/10.1186/s13059-024-03385-6","url":null,"abstract":"Advances in single-cell transcriptomics provide an unprecedented opportunity to explore complex biological processes. However, computational methods for analyzing single-cell transcriptomics still have room for improvement especially in dimension reduction, cell clustering, and cell–cell communication inference. Herein, we propose a versatile method, named DcjComm, for comprehensive analysis of single-cell transcriptomics. DcjComm detects functional modules to explore expression patterns and performs dimension reduction and clustering to discover cellular identities by the non-negative matrix factorization-based joint learning model. DcjComm then infers cell–cell communication by integrating ligand-receptor pairs, transcription factors, and target genes. DcjComm demonstrates superior performance compared to state-of-the-art methods.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142158747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-09-06DOI: 10.1186/s13059-024-03381-w
Kirsten Seale, Andrew Teschendorff, Alexander P. Reiner, Sarah Voisin, Nir Eynon
{"title":"A comprehensive map of the aging blood methylome in humans","authors":"Kirsten Seale, Andrew Teschendorff, Alexander P. Reiner, Sarah Voisin, Nir Eynon","doi":"10.1186/s13059-024-03381-w","DOIUrl":"https://doi.org/10.1186/s13059-024-03381-w","url":null,"abstract":"During aging, the human methylome undergoes both differential and variable shifts, accompanied by increased entropy. The distinction between variably methylated positions (VMPs) and differentially methylated positions (DMPs), their contribution to epigenetic age, and the role of cell type heterogeneity remain unclear. We conduct a comprehensive analysis of > 32,000 human blood methylomes from 56 datasets (age range = 6–101 years). We find a significant proportion of the blood methylome that is differentially methylated with age (48% DMPs; FDR < 0.005) and variably methylated with age (37% VMPs; FDR < 0.005), with considerable overlap between the two groups (59% of DMPs are VMPs). Bivalent and Polycomb regions become increasingly methylated and divergent between individuals, while quiescent regions lose methylation more uniformly. Both chronological and biological clocks, but not pace-of-aging clocks, show a strong enrichment for CpGs undergoing both mean and variance changes during aging. The accumulation of DMPs shifting towards a methylation fraction of 50% drives the increase in entropy, smoothening the epigenetic landscape. However, approximately a quarter of DMPs exhibit anti-entropic effects, opposing this direction of change. While changes in cell type composition minimally affect DMPs, VMPs and entropy measurements are moderately sensitive to such alterations. This study represents the largest investigation to date of genome-wide DNA methylation changes and aging in a single tissue, providing valuable insights into primary molecular changes relevant to chronological and biological aging.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-09-04DOI: 10.1186/s13059-024-03387-4
Alessandro Vinceti, Rafaele M. Iannuzzi, Isabella Boyle, Lucia Trastulla, Catarina D. Campbell, Francisca Vazquez, Joshua M. Dempster, Francesco Iorio
{"title":"Author Correction: A benchmark of computational methods for correcting biases of established and unknown origin in CRISPR-Cas9 screening data","authors":"Alessandro Vinceti, Rafaele M. Iannuzzi, Isabella Boyle, Lucia Trastulla, Catarina D. Campbell, Francisca Vazquez, Joshua M. Dempster, Francesco Iorio","doi":"10.1186/s13059-024-03387-4","DOIUrl":"https://doi.org/10.1186/s13059-024-03387-4","url":null,"abstract":"<p><b>Correction</b><b>: </b><b>Genome Biol 25, 192 (2024)</b></p><p><b>https://doi.org/10.1186/s13059-024-03336-1</b></p><br/><p>Following publication of the original article [1], the authors identified an omission in the completing interests section. The omitted text is given in bold below.</p><p><b>Competing interests</b></p><p>FI receives funding from Open Targets, a public-private initiative involving academia and industry and performs consultancy for the joint CRUK-AstraZeneca Functional Genomics Centre and for Mosaic TX. JD is a consultant for and holds equity in Jumble Therapeutics. CDC performs consultancy for Droplet Biosciences and is a shareholder of Novartis. <b>FV receives research support from the Dependency Map Consortium, Riva Therapeutics, Bristol Myers Squibb, Merck, Illumina, and Deerfield Management. FV is on the scientific advisory board of GSK, is a consultant and holds equity in Riva Therapeutics and is a co-founder and holds equity in Jumble Therapeutics</b>. All other authors declare that they have no competing interests.</p><p>The original article [1] is corrected.</p><ol data-track-component=\"outbound reference\" data-track-context=\"references section\"><li data-counter=\"1.\"><p>Vinceti A, Iannuzzi RM, Boyle I, et al. A benchmark of computational methods for correcting biases of established and unknown origin in CRISPR-Cas9 screening data. Genome Biol. 2024;25:192. https://doi.org/10.1186/s13059-024-03336-1.</p><p>Article PubMed PubMed Central Google Scholar </p></li></ol><p>Download references<svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" role=\"img\" width=\"16\"><use xlink:href=\"#icon-eds-i-download-medium\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"></use></svg></p><h3>Authors and Affiliations</h3><ol><li><p>Computational Biology Research Centre, Human Technopole, Milan, Italy</p><p>Alessandro Vinceti, Rafaele M. Iannuzzi, Lucia Trastulla & Francesco Iorio</p></li><li><p>Broad Institute of Harvard and MIT, Cambridge, MA, USA</p><p>Isabella Boyle, Catarina D. Campbell, Francisca Vazquez & Joshua M. Dempster</p></li></ol><span>Authors</span><ol><li><span>Alessandro Vinceti</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Rafaele M. Iannuzzi</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Isabella Boyle</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Lucia Trastulla</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Catarina D. Campbell</span>View author publications<p>You can also search for this author in <span>PubMed<span> </span>Google Scholar</span></p></li><li><span>Francisca Vazquez</span>View author publications<p>You can also search for this author in <","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-09-03DOI: 10.1186/s13059-024-03377-6
Sean M. Flynn, Somdutta Dhir, Krzysztof Herka, Colm Doyle, Larry Melidis, Angela Simeone, Winnie W. I. Hui, Rafael de Cesaris Araujo Tavares, Stefan Schoenfelder, David Tannahill, Shankar Balasubramanian
{"title":"Improved simultaneous mapping of epigenetic features and 3D chromatin structure via ViCAR","authors":"Sean M. Flynn, Somdutta Dhir, Krzysztof Herka, Colm Doyle, Larry Melidis, Angela Simeone, Winnie W. I. Hui, Rafael de Cesaris Araujo Tavares, Stefan Schoenfelder, David Tannahill, Shankar Balasubramanian","doi":"10.1186/s13059-024-03377-6","DOIUrl":"https://doi.org/10.1186/s13059-024-03377-6","url":null,"abstract":"Methods to measure chromatin contacts at genomic regions bound by histone modifications or proteins are important tools to investigate chromatin organization. However, such methods do not capture the possible involvement of other epigenomic features such as G-quadruplex DNA secondary structures (G4s). To bridge this gap, we introduce ViCAR (viewpoint HiCAR), for the direct antibody-based capture of chromatin interactions at folded G4s. Through ViCAR, we showcase the first G4-3D interaction landscape. Using histone marks, we also demonstrate how ViCAR improves on earlier approaches yielding increased signal-to-noise. ViCAR is a practical and powerful tool to explore epigenetic marks and 3D genome interactomes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142123681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-09-03DOI: 10.1186/s13059-024-03376-7
Brennan H. Baker, Sheela Sathyanarayana, Adam A. Szpiro, James W. MacDonald, Alison G. Paquette
{"title":"RNAseqCovarImpute: a multiple imputation procedure that outperforms complete case and single imputation differential expression analysis","authors":"Brennan H. Baker, Sheela Sathyanarayana, Adam A. Szpiro, James W. MacDonald, Alison G. Paquette","doi":"10.1186/s13059-024-03376-7","DOIUrl":"https://doi.org/10.1186/s13059-024-03376-7","url":null,"abstract":"Missing covariate data is a common problem that has not been addressed in observational studies of gene expression. Here, we present a multiple imputation method that accommodates high dimensional gene expression data by incorporating principal component analysis of the transcriptome into the multiple imputation prediction models to avoid bias. Simulation studies using three datasets show that this method outperforms complete case and single imputation analyses at uncovering true positive differentially expressed genes, limiting false discovery rates, and minimizing bias. This method is easily implemented via an R Bioconductor package, RNAseqCovarImpute that integrates with the limma-voom pipeline for differential expression analysis.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142123682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2024-09-02DOI: 10.1186/s13059-024-03374-9
Olivier B. Poirion, Wulin Zuo, Catrina Spruce, Candice N. Baker, Sandra L. Daigle, Ashley Olson, Daniel A. Skelly, Elissa J. Chesler, Christopher L. Baker, Brian S. White
{"title":"Enhlink infers distal and context-specific enhancer–promoter linkages","authors":"Olivier B. Poirion, Wulin Zuo, Catrina Spruce, Candice N. Baker, Sandra L. Daigle, Ashley Olson, Daniel A. Skelly, Elissa J. Chesler, Christopher L. Baker, Brian S. White","doi":"10.1186/s13059-024-03374-9","DOIUrl":"https://doi.org/10.1186/s13059-024-03374-9","url":null,"abstract":"Enhlink is a computational tool for scATAC-seq data analysis, facilitating precise interrogation of enhancer function at the single-cell level. It employs an ensemble approach incorporating technical and biological covariates to infer condition-specific regulatory DNA linkages. Enhlink can integrate multi-omic data for enhanced specificity, when available. Evaluation with simulated and real data, including multi-omic datasets from the mouse striatum and novel promoter capture Hi-C data, demonstrate that Enhlink outperfoms alternative methods. Coupled with eQTL analysis, it identified a putative super-enhancer in striatal neurons. Overall, Enhlink offers accuracy, power, and potential for revealing novel biological insights in gene regulation.\u0000","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142118216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}