Genome research最新文献

Integrative chromatin state annotation of 234 human ENCODE4 cell types using Segway 使用Segway对234种人类ENCODE4细胞类型的整合染色质状态进行注释

IF 7 2区生物学

Genome research Pub Date : 2025-10-06 DOI: 10.1101/gr.280633.125

Marjan Farahbod, Aboud Diab, Paul Sud, Meenakshi S. Kagda, Ian Whaling, Mehdi Foroozandeh, Ishan Goel, Habib Daneshpajouh, Benjamin Hitz, J. Michael Cherry, Maxwell W. Libbrecht

引用次数: 0

Corrigendum: A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes 更正：绵羊泛基因组揭示了结构变异的频谱及其对尾部表型的影响

IF 7 2区生物学

Genome research Pub Date : 2025-10-01 DOI: 10.1101/gr.281340.125

Ran Li, Mian Gong, Xinmiao Zhang, Fei Wang, Zhenyu Liu, Lei Zhang, Qimeng Yang, Yuan Xu, Mengsi Xu, Huanhuan Zhang, Yunfeng Zhang, Xuelei Dai, Yuanpeng Gao, Zhuangbiao Zhang, Wenwen Fang, Yuta Yang, Weiwei Fu, Chunna Cao, Peng Yang, Zeinab Amiri Ghanatsaman, Niloufar Jafarpour Negari, Hojjat Asadollahpour Nanaei, Xiangpeng Yue, Yuxuan Song, Xianyong Lan, Weidong Deng, Xihong Wang, Chuanying Pan, Ruidong Xiang, Eveline M. Ibeagha-Awemu, Pat (J.S.) Heslop-Harrison, Benjamin D. Rosen, Johannes A. Lenstra, Shangquan Gan, Yu Jiang

引用次数: 0

Strong bias in long-read sequencing prevents assembly of Drosophila melanogaster Y-linked genes 长读测序的强烈偏见阻止了果蝇y连锁基因的组装

IF 7 2区生物学

Genome research Pub Date : 2025-10-01 DOI: 10.1101/gr.280604.125

Antonio Bernardo Carvalho, Bernard Y Kim, Fabiana Uno

{"title":"Strong bias in long-read sequencing prevents assembly of Drosophila melanogaster Y-linked genes","authors":"Antonio Bernardo Carvalho, Bernard Y Kim, Fabiana Uno","doi":"10.1101/gr.280604.125","DOIUrl":"https://doi.org/10.1101/gr.280604.125","url":null,"abstract":"Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are generally considered free from sequence composition bias, a key factor - alongside read length - that explains their success in producing high quality genome assemblies. Indeed, there had been very few reports of bias, the clearest one against GA-rich repeats in the human genome. However, our study reveals a systematic failure of both technologies to sequence and assemble specific exons of Drosophila melanogaster genes, indicating an overlooked limitation. Namely, multiple Y-linked exons are nearly or completely absent from raw reads produced by deep sequencing with state-of-the-art ONT (10.4 flow cells, 200× coverage) and PacBio (HiFi 50×). The same exons are accurately assembled using Illumina 67× coverage. We found that these missing exons are consistently located near simple satellite sequences, where sequencing fails at multiple levels: read initiation (very few reads start within satellite regions), read elongation (satellite-containing reads are shorter on average), and base-calling (quality scores drop as sequencing enters a satellite sequence). These findings challenge the assumption that long-read technologies are unbiased and reveal a critical barrier to assembling sequences near repetitive regions. As large-scale sequencing projects move towards telomere-to-telomere assemblies in a wide range of organisms, recognizing and addressing these biases will be important to achieving truly complete and accurate genomes. Additionally, the underrepresented Y-linked exons provides a valuable benchmark for refining those sequencing technologies while improving the assembly of the highly heterochromatic and often neglected Drosophila Y Chromosome.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"101 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Highly accurate reference and method selection for universal cross-dataset cell type annotation with CAMUS 基于CAMUS的通用跨数据集单元类型标注的高精度参考和方法选择

IF 7 2区生物学

Genome research Pub Date : 2025-10-01 DOI: 10.1101/gr.280821.125

Qunlun Shen, Shuqin Zhang, Shihua Zhang

{"title":"Highly accurate reference and method selection for universal cross-dataset cell type annotation with CAMUS","authors":"Qunlun Shen, Shuqin Zhang, Shihua Zhang","doi":"10.1101/gr.280821.125","DOIUrl":"https://doi.org/10.1101/gr.280821.125","url":null,"abstract":"Cell type annotation is a critical and essential task in single-cell data analysis. Various reference-based methods have provided rapid annotation for diverse single-cell data. However, how to select the optimal references and methods is often overlooked. To this end, we present a cross-dataset cell-type annotation methodology with a universal reference data and method selection strategy (CAMUS) to achieve highly accurate and efficient annotations. We demonstrate the advantages of CAMUS by conducting comprehensive analyses on 672 pairs of cross-species scRNA-seq datasets. The annotation results with references selected by CAMUS achieved substantial accuracy gains (25.0-124.7%) over random selection strategies across five reference-based methods. CAMUS achieved high accuracy in choosing the best reference-method pair among 3360 pairs (49.1%). Moreover, CAMUS showed high accuracy in selecting the best methods on the 80 scST datasets (82.5%) and five scATAC-seq datasets (100.0%), illustrating its universal applicability. In addition, we utilized the CAMUS score with other metrics to predict the annotation accuracy, providing direct guidance on whether to accept current annotation results.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"95 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptation of centromere to breakage through local genomic and epigenomic remodeling in wheat 小麦着丝粒通过局部基因组和表观基因组重塑适应断裂

IF 7 2区生物学

Genome research Pub Date : 2025-09-30 DOI: 10.1101/gr.280913.125

Jingwei Zhou, Yuhong Huang, Huan Ma, Yiqian Chen, Chuanye Chen, Fangpu Han, Handong Su

{"title":"Adaptation of centromere to breakage through local genomic and epigenomic remodeling in wheat","authors":"Jingwei Zhou, Yuhong Huang, Huan Ma, Yiqian Chen, Chuanye Chen, Fangpu Han, Handong Su","doi":"10.1101/gr.280913.125","DOIUrl":"https://doi.org/10.1101/gr.280913.125","url":null,"abstract":"Centromeres, characterized by their unique chromatin attributes, are indispensable for safeguarding genomic stability. Due to their intricate and fragile nature, centromeres are susceptible to chromosomal rearrangements. However, the mechanisms preserving their functional integrity and supporting nucleus homeostasis following breakages remained enigmatic. In this study, we use wheat ditelosomic stocks, which arise from centromere breakage, to explore the genetic and epigenetic alterations in damaged centromeres. Our investigations unveil novel chromosome end structures marked by de novo addition of telomeres, as well as localized chromosomal shattering, including segment deletions and duplications near centromere breakpoints. We reveal that the damaged centromeres possess a remarkable capacity for self-regulation, through employing structural modifications such as expansion, contraction, and neocentromere formation to maintain their functional integrity. Centromere breakage triggers nucleosome remodeling and is accompanied by local transcription changes and chromatin reorganization, and subsequently may contribute to the stabilization of broken chromosomes. Our findings highlight the resilience and adaptability of plant chromosomes in response to centromere breakage, and provide valuable insights into the stability of centromeres, thereby offering promising prospects to manipulate centromeres for targeted chromosomal innovation and crop genetic improvement.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"29 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145195152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Long-read reconstruction of many diverse haplotypes with devider 带分裂器的多种单倍型的长读重建

IF 7 2区生物学

Genome research Pub Date : 2025-09-23 DOI: 10.1101/gr.280510.125

Jim Shaw, Christina Boucher, Yun William Yu, Noelle Noyes, Heng Li

{"title":"Long-read reconstruction of many diverse haplotypes with devider","authors":"Jim Shaw, Christina Boucher, Yun William Yu, Noelle Noyes, Heng Li","doi":"10.1101/gr.280510.125","DOIUrl":"https://doi.org/10.1101/gr.280510.125","url":null,"abstract":"Reconstructing exact haplotypes is important when sequencing a mixture of similar sequences. Long-read sequencing can connect distant alleles to disentangle similar haplotypes, but handling sequencing errors requires specialized techniques. We present devider, an algorithm for haplotyping small sequences - such as viruses or genes - from long-read sequencing. devider uses a positional de Bruijn graph with sequence-to-graph alignment on an alphabet of informative alleles to provide a fast assembly-inspired approach compatible with various long-read sequencing technologies. On a synthetic Nanopore dataset containing seven HIV strains, devider recovered 97% of the haplotype content and had the most accurate abundance estimates while taking < 4 minutes and 1 GB of memory for > 8000× coverage. Benchmarking on synthetic mixtures of antimicrobial resistance (AMR) genes showed that devider recovered 83% of haplotypes, 23 percentage points higher than the next best method. On real PacBio and Nanopore datasets, devider recapitulates previously known results in seconds, disentangling a bacterial community with > 10 strains and an HIV-1 co-infection dataset. We used devider to investigate the within-host diversity of a long-read bovine gut metagenome enriched for AMR genes, discovering 13 distinct haplotypes for a tet(Q) tetracycline resistance gene with > 18,000× coverage and 6 haplotypes for a CfxA2 beta-lactamase gene. We found clear recombination blocks for these AMR gene haplotypes, showcasing devider's ability to unveil evolutionary signals for heterogeneous mixtures.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"28 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep structural clustering reveals hidden systematic biases in RNA sequencing data 深层结构聚类揭示了RNA测序数据中隐藏的系统性偏差

IF 7 2区生物学

Genome research Pub Date : 2025-09-19 DOI: 10.1101/gr.280713.125

Qiang Su, Yi Long, Deming Gou, Junmin Quan, Xiaoming Zhou, Qizhou Lian

{"title":"Deep structural clustering reveals hidden systematic biases in RNA sequencing data","authors":"Qiang Su, Yi Long, Deming Gou, Junmin Quan, Xiaoming Zhou, Qizhou Lian","doi":"10.1101/gr.280713.125","DOIUrl":"https://doi.org/10.1101/gr.280713.125","url":null,"abstract":"RNA sequencing (RNA-seq) is a pivotal tool for transcriptomic analysis, providing comprehensive exploration of gene expression across diverse biological contexts. However, RNA-seq data is susceptible to various biases that can significantly compromise the accuracy and reliability of transcript quantification. This study investigates the influence of high-dimensional RNA structures on local sequencing efficiency using an innovative unsupervised Variational Autoencoder-Gaussian Mixture Model (VAE-GMM). The VAE-GMM effectively captures intricate high-dimensional k-mer structural similarities by learning compact latent representations, which reduces dimensionality while meticulously preserving essential structural features crucial for bias identification. This sophisticated modeling allows precise tracking of local RNA-read conversion dynamics and the identification of complex, often overlooked, bias sources. We rigorously validate the VAE-GMM model's performance and robustness against conventional machine learning techniques, including Gaussian Mixture Models (GMM-only), Principal Component Analysis-based GMMs, k-means clustering, and Hierarchical Clustering. These validations, using an extensive and diverse array of datasets including synthetic RNA constructs, various human cell lines, and authentic tissue samples, consistently demonstrate the model's superior versatility and accuracy across different biological systems. Furthermore, in silico simulations of the sequencing process closely align with actual sequencing data, strongly reinforcing the critical role of high-dimensional RNA structures in determining sequencing efficiency and their impact on data quality. Our findings offer valuable insights into the underlying mechanisms of RNA structure-mediated sequencing bias. This deeper understanding enables more accurate and reliable RNA-seq analyses and is expected to improve the interpretation of transcriptomic data in future genomic studies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"27 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recalibrating differential gene expression by genetic dosage variance prioritizes functionally relevant genes 通过基因剂量方差重新校准差异基因表达优先考虑功能相关基因

IF 7 2区生物学

Genome research Pub Date : 2025-09-17 DOI: 10.1101/gr.280360.124

Philipp Rentzsch, Aaron Kollotzek, Kaushik Ram Ganapathy, Pejman Mohammadi, Tuuli Lappalainen

{"title":"Recalibrating differential gene expression by genetic dosage variance prioritizes functionally relevant genes","authors":"Philipp Rentzsch, Aaron Kollotzek, Kaushik Ram Ganapathy, Pejman Mohammadi, Tuuli Lappalainen","doi":"10.1101/gr.280360.124","DOIUrl":"https://doi.org/10.1101/gr.280360.124","url":null,"abstract":"Differential expression (DE) analysis is a widely used method for identifying genes that are functionally relevant for an observed phenotype or biological response. However, typical DE analysis includes selection of genes based on a threshold of fold change in expression under the implicit assumption that all genes are equally sensitive to dosage changes of their transcripts. This tends to favor highly variable genes over more constrained genes where even small changes in expression may be biologically relevant. To address this limitation, we have developed a method to recalibrate each gene's DE fold change based on genetic expression variance observed in the human population. The newly established metric ranks statistically differentially expressed genes, not by nominal change of expression, but by relative change in comparison to natural dosage variation for each gene. We apply our method to RNA sequencing data sets from in vitro stimulus response and neuropsychiatric disease experiments. Compared to the standard approach, our method adjusts the bias in discovery toward highly variable genes and enriches for pathways and biological processes related to metabolic and regulatory activity, indicating a prioritization of functionally relevant driver genes. Tissue-specific recalibration increases detection of known disease-relevant processes. Altogether, our method provides a novel view on DE and contributes toward bridging the existing gap between statistical and biological significance. We believe that this approach will simplify the identification of disease-causing molecular processes and enhance the discovery of therapeutic targets.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"53 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145077422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analyzing the large and complex SFARI autism cohort data using the Genotypes and Phenotypes in Families (GPF) platform 使用家族基因型和表型（GPF）平台分析大量复杂的SFARI自闭症队列数据

IF 7 2区生物学

Genome research Pub Date : 2025-09-16 DOI: 10.1101/gr.280356.124

Liubomir Chorbadjiev, Murat Cokol, Zohar Weinstein, Kevin Shi, Christopher Fleisch, Nikolay Dimitrov, Svetlin Mladenov, Ivo Todorov, Iordan Ivanov, Simon Xu, Steven Ford, Yoon-ha Lee, Boris Yamrom, Steven Marks, Adriana Munoz, Alex Lash, Natalia Volfovsky, Ivan Iossifov

{"title":"Analyzing the large and complex SFARI autism cohort data using the Genotypes and Phenotypes in Families (GPF) platform","authors":"Liubomir Chorbadjiev, Murat Cokol, Zohar Weinstein, Kevin Shi, Christopher Fleisch, Nikolay Dimitrov, Svetlin Mladenov, Ivo Todorov, Iordan Ivanov, Simon Xu, Steven Ford, Yoon-ha Lee, Boris Yamrom, Steven Marks, Adriana Munoz, Alex Lash, Natalia Volfovsky, Ivan Iossifov","doi":"10.1101/gr.280356.124","DOIUrl":"https://doi.org/10.1101/gr.280356.124","url":null,"abstract":"The exploration of genotypic variants impacting phenotypes is a cornerstone in genetics research. The emergence of vast collections containing deeply genotyped and phenotyped families has made it possible to pursue the search for variants associated with complex diseases. However, managing these large-scale data sets requires specialized computational tools to organize and analyze the extensive data. Genotypes and Phenotypes in Families (GPF) is an open-source platform that manages genotypes and phenotypes derived from collections of families. GPF allows interactive exploration of genetic variants, enrichment analysis for de novo mutations, phenotype/genotype association tools, and secure data sharing. GPF is used to disseminate two family collection data sets, SSC and SPARK, for the study of autism, built by the Simons Foundation. The GPF instance at the Simons Foundation (GPF-SFARI) provides protected access to comprehensive genotypic and phenotypic data for SSC and SPARK. GPF-SFARI also provides public access to an extensive collection of de novo mutations from individuals with autism and related disorders and to gene-level statistics of the protected data sets characterizing the genes’ roles in autism. However, GPF is versatile and can manage genotypic data from other small or large family collections. Here, we highlight the primary features of GPF within the context of GPF-SFARI.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"37 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145072494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Long-read sequencing reveals HBV integration patterns and oncogenic impact on early-onset hepatocellular carcinoma 长读序列揭示了HBV整合模式和对早发性肝细胞癌的致癌影响

IF 7 2区生物学

Genome research Pub Date : 2025-09-16 DOI: 10.1101/gr.279889.124

Yao Wang, Dong Yu, Yue Mei, Zhida Fu, Jian Lin, Di Wu, Yuan Yang, Hongli Yan

{"title":"Long-read sequencing reveals HBV integration patterns and oncogenic impact on early-onset hepatocellular carcinoma","authors":"Yao Wang, Dong Yu, Yue Mei, Zhida Fu, Jian Lin, Di Wu, Yuan Yang, Hongli Yan","doi":"10.1101/gr.279889.124","DOIUrl":"https://doi.org/10.1101/gr.279889.124","url":null,"abstract":"Hepatitis B virus (HBV) integration is a key driver of hepatocellular carcinoma (HCC) occurrence and progression; however, its oncogenic mechanisms remain incompletely understood because of limitations in detection methods and sample availability. In this study, we employed Oxford Nanopore Technologies (ONT) whole-genome sequencing and full-length transcriptome sequencing to characterize HBV integration events at the genomic and transcriptomic levels, along with their regulatory effects on structural variations (SVs) and gene expression. Functional validation was performed using dual-luciferase assays and cell-based experiments. Our findings revealed that integrated HBV sequences form long concatemers, mediating inter- and intrachromosomal recombination in the human genome. Notably, integrated HBV enhancer I (HBV-Enh I) was detected in 6 of 7 tumor tissues and was associated with aberrant gene expression. HBV integration induced oncogenic SVs, such as focal MYC amplification and NAV2 deletion, and directly modulated gene expression. Additionally, ectopic overexpression of MYOCD, driven by HBV-Enh I integration, promoted HCC cell migration and invasion. In summary, HBV integration acts as a major driver of large-scale genomic SVs and transcriptomic dysregulation, through either direct alterations in genome dosage or cis-regulatory mechanisms. HBV-Enh I is frequently integrated in HCC and might play a pivotal role in abnormal gene expression, highlighting its potential as a therapeutic target.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"46 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145067702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0