Genome research最新文献_第3页

Quadrupia provides a comprehensive catalog of G-quadruplexes across genomes from the tree of life Quadrupia提供了来自生命之树基因组的g -四联体的综合目录

IF 7 2区生物学

Genome research Pub Date : 2025-08-26 DOI: 10.1101/gr.279790.124

Nikol Chantzi, Akshatha Nayak, Fotis A. Baltoumas, Eleni Aplakidou, Shiau Wei Liew, Jesslyn Elvaretta Galuh, Michail Patsakis, Austin Montgomery, Camille Moeckel, Ioannis Mouratidis, Saiful Arefeen Sazed, Wilfried Guiblet, Panagiotis Karmiris-Obratański, Guliang Wang, Apostolos Zaravinos, Karen M. Vasquez, Chun Kit Kwok, Georgios A. Pavlopoulos, Ilias Georgakopoulos-Soares

{"title":"Quadrupia provides a comprehensive catalog of G-quadruplexes across genomes from the tree of life","authors":"Nikol Chantzi, Akshatha Nayak, Fotis A. Baltoumas, Eleni Aplakidou, Shiau Wei Liew, Jesslyn Elvaretta Galuh, Michail Patsakis, Austin Montgomery, Camille Moeckel, Ioannis Mouratidis, Saiful Arefeen Sazed, Wilfried Guiblet, Panagiotis Karmiris-Obratański, Guliang Wang, Apostolos Zaravinos, Karen M. Vasquez, Chun Kit Kwok, Georgios A. Pavlopoulos, Ilias Georgakopoulos-Soares","doi":"10.1101/gr.279790.124","DOIUrl":"https://doi.org/10.1101/gr.279790.124","url":null,"abstract":"G-quadruplex DNA structures exhibit a profound influence on essential biological processes, including transcription, replication, telomere maintenance, and genomic stability. These structures have demonstrably shaped organismal evolution. However, a comprehensive, organism-wide G-quadruplex map encompassing the diversity of life has remained elusive. Here, we introduce Quadrupia, the most extensive and well-characterized G-quadruplex database to date, facilitating the exploration of G-quadruplex structures across the evolutionary spectrum. Quadrupia has identified G-quadruplex sequences in 108,449 reference genomes, with a total of 140,181,277 G-quadruplexes. The database also hosts a collection of 319,784 G-quadruplex clusters of 20 or more members, annotated by taxonomic distributions, multiple sequence alignments, profile hidden Markov models and cross-references to G-quadruplex 3D structures. Examination of G-quadruplexes across functional genomic elements in different taxa indicates preferential orientation and positioning, with significant differences between individual taxonomic groups. For example, we find that G-quadruplexes in bacteria with a single replication origin display profound preference for the leading orientation. Finally, we experimentally validate the most frequently observed G-quadruplexes using CD-spectroscopy, UV melting, and fluorescent-based approaches.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"191 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust 16S rRNA classification based on a compressed LCA index 基于压缩LCA索引的稳健16S rRNA分类

IF 7 2区生物学

Genome research Pub Date : 2025-08-25 DOI: 10.1101/gr.279846.124

Omar Y. Ahmed, Christina Boucher, Ben Langmead

{"title":"Robust 16S rRNA classification based on a compressed LCA index","authors":"Omar Y. Ahmed, Christina Boucher, Ben Langmead","doi":"10.1101/gr.279846.124","DOIUrl":"https://doi.org/10.1101/gr.279846.124","url":null,"abstract":"Taxonomic sequence classification is a computational problem central to the study of metagenomics and evolution Advances in compressed indexing with the r-index enable full-text pattern matching against large sequence collections. But the data structures that link pattern sequences to their clades of origin still do not scale well to large collections. Previous work proposed the document array profiles, which use O(rd) words of space where r is the number of maximal-equal letter runs in the Burrows-Wheeler transform and d is the number of distinct genomes. The linear dependence on d is limiting, since real taxonomies can easily contain 10,000s of leaves or more. We propose a method called cliff compression that reduces this size by a large factor, over 250× when indexing the SILVA 16S rRNA gene database. This method uses Θ(r log d) words of space in expectation under a random model we propose here. We implemented these ideas in an open source tool called Cliffy that performs efficient taxonomic classification of sequencing reads with respect to a compressed taxonomic index. When applied to simulated 16S rRNA reads, Cliffy's read-level accuracy is higher than Kraken2's by 11-18%. Clade abundances are also more accurately predicted by Cliffy compared to Kraken2 and Bracken. Overall, Cliffy is a fast and space-economical extension to compressed full-text indexes, enabling them to perform fast and accurate taxonomic classification queries. Cliffy's accuracy underscores the advantages of full-text indexes, which offer a more precise solution compared to k-mer indexes designed for a specific k value.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"10 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tree-based differential testing using inferential uncertainty for RNA-seq 基于树的差分测试，使用RNA-seq的推理不确定性

IF 7 2区生物学

Genome research Pub Date : 2025-08-21 DOI: 10.1101/gr.279981.124

Noor P Singh, Euphy Wu, Jason Fan, Michael I Love, Rob Patro

{"title":"Tree-based differential testing using inferential uncertainty for RNA-seq","authors":"Noor P Singh, Euphy Wu, Jason Fan, Michael I Love, Rob Patro","doi":"10.1101/gr.279981.124","DOIUrl":"https://doi.org/10.1101/gr.279981.124","url":null,"abstract":"Identifying differentially expressed transcripts poses a crucial yet challenging problem in transcriptomic. Substantial uncertainty is associated with the abundance estimates of certain transcripts which, if ignored, can lead to the exaggeration of false positives and, if included, may lead to reduced power. Given a set of RNA-seq samples, TreeTerminus arranges transcripts in a hierarchical tree structure that encodes different layers of resolution for interpretation of the abundance of transcriptional groups, with uncertainty generally decreasing as one ascends the tree from the leaves. We introduce mehenDi, which utilizes the tree structure from TreeTerminus for differential testing. The nodes output by mehenDi, called the selected nodes are determined in a data-driven manner to maximize the signal that can be extracted from the data while controlling for the uncertainty associated with estimating the transcript abundances. The identified selected nodes can include transcripts and inner nodes, with no two nodes having an ancestor/descendant relationship. We evaluated our method on both simulated and experimental datasets, comparing its performance with other tree-based differential methods as well as with uncertainty-aware differential transcript/gene expression methods. Our method detects inner nodes that show a strong signal for differential expression, which would have been overlooked when analyzing the transcripts alone.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"9 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating the size of long tandem repeat expansions from short reads with ScatTR 用ScatTR估计短读长串联重复扩增的大小

IF 7 2区生物学

Genome research Pub Date : 2025-08-21 DOI: 10.1101/gr.280563.125

Rashid Al-Abri, Gamze Gursoy

{"title":"Estimating the size of long tandem repeat expansions from short reads with ScatTR","authors":"Rashid Al-Abri, Gamze Gursoy","doi":"10.1101/gr.280563.125","DOIUrl":"https://doi.org/10.1101/gr.280563.125","url":null,"abstract":"Tandem repeats (TRs) are sequences of DNA where two or more base pairs are repeated back-to-back at specific locations in the genome. TR expansions, where the number of repeat units exceeds the normal range, have been implicated in over 50 conditions. However, accurately measuring the copy number of TRs is challenging, especially when their expansions are larger than the fragment sizes used in standard short-read genome sequencing. Here, we introduce ScatTR, a novel computational method that leverages a maximum likelihood framework to estimate the copy number of large TR expansions from short-read sequencing data. ScatTR calculates the likelihood of different alignments between sequencing reads and reference sequences that represent various TR lengths and employs a Monte Carlo technique to find the best match. In simulated data, ScatTR outperforms state-of-the-art methods, particularly for TRs with longer motifs and those with lengths that greatly exceed typical sequencing fragment sizes. When applied to data from the 1000 Genomes Project, ScatTR detects potential large TR expansions that other methods missed, highlighting its ability to better characterize genome-wide TR variation.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"146 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deciphering context-specific gene programs from single-cell and spatial transcriptomics data with DeCEP 用DeCEP从单细胞和空间转录组学数据中破译上下文特异性基因程序

IF 7 2区生物学

Genome research Pub Date : 2025-08-21 DOI: 10.1101/gr.279689.124

Lin Li, Xianbin Su, Ze-Guang Han

{"title":"Deciphering context-specific gene programs from single-cell and spatial transcriptomics data with DeCEP","authors":"Lin Li, Xianbin Su, Ze-Guang Han","doi":"10.1101/gr.279689.124","DOIUrl":"https://doi.org/10.1101/gr.279689.124","url":null,"abstract":"Functional gene programs play a wide range of roles in health and disease by orchestrating transcriptional coregulation to govern cell identity. Understanding these intricate gene programs is essential for unraveling the complexities of biological systems; however, deciphering them remains a significant challenge. Recent advancements in single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) technologies have empowered the comprehensive characterization of gene programs at both single-cell and spatial resolutions. Here, we present DeCEP, a computational framework designed to characterize context-specific gene programs using scRNA-seq and ST data. DeCEP leverages functional gene lists and directed graphs to construct functional networks underlying distinct cellular or spatial contexts. It then identifies context-dependent hub genes associated with specific gene programs based on network topology and assigns gene program activity to individual cells or spatial locations. Through evaluation on both simulated and real biological datasets, DeCEP demonstrates complementary strengths over existing methods by enabling more fine-grained characterization of gene programs within specific contexts, particularly those characterized by pronounced transcriptional heterogeneity. Furthermore, we showcase the ability of DeCEP in elucidating biological insights through case studies on normal liver tissue, Alzheimer' disease, and cancer.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"38 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accurate detection of tandem repeats from error-prone sequences with EquiRep 用EquiRep准确检测易出错序列的串联重复序列

IF 7 2区生物学

Genome research Pub Date : 2025-08-21 DOI: 10.1101/gr.280750.125

Zhezheng Song, Tasfia Zahin, Xiang Li, Mingfu Shao

{"title":"Accurate detection of tandem repeats from error-prone sequences with EquiRep","authors":"Zhezheng Song, Tasfia Zahin, Xiang Li, Mingfu Shao","doi":"10.1101/gr.280750.125","DOIUrl":"https://doi.org/10.1101/gr.280750.125","url":null,"abstract":"A tandem repeat is a sequence of nucleotides that appear as multiple contiguous, near-identical copies arranged consecutively. Tandem repeats are widespread across natural genomes, play critical roles in genetic diversity, gene regulation, and are associated with various neurological and developmental disorders. They can also arise in sequencing reads generated by certain technologies, such as those used for sequencing circular molecules. A key challenge in analyzing tandem repeats is reconstructing the sequence of the underlying repeat unit. While several methods exist, they often exhibit low accuracy when the repeat unit length increases or the number of copies is low. Furthermore, methods capable of handling highly mutated sequences remain scarce, highlighting a significant opportunity for improvement. We introduce EquiRep, a tool for accurate detection of tandem repeats from erroneous sequences. EquiRep estimates the likelihood of positions originating from the same location in the unit through self-alignment, followed by a novel refinement approach. The resulting equivalence classes and consecutive position information are then used to build a weighted graph. A cycle in this graph with maximum bottleneck weight covering most nucleotide positions is identified to reconstruct the repeat unit. We test EquiRep on two applications, identifying repeat units from satellite DNAs and reconstructing circular RNAs from rolling-circular long-read sequencing data, using both simulated and raw sequencing datasets. Our results show that EquiRep consistently outperforms or matches state-of-the-art methods, demonstrating robustness to sequencing errors and superior performance on long repeat units and low-frequency repeats. These capabilities underscore EquiRep’s broad utility in tandem repeat analysis.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"8 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pangenome-based genome inference using integer programming 基于泛基因组的整数规划基因组推断

IF 7 2区生物学

Genome research Pub Date : 2025-08-21 DOI: 10.1101/gr.280567.125

Ghanshyam Chandra, Md Helal Hossen, Stephan Scholz, Alexander T Dilthey, Daniel Gibney, Chirag Jain

{"title":"Pangenome-based genome inference using integer programming","authors":"Ghanshyam Chandra, Md Helal Hossen, Stephan Scholz, Alexander T Dilthey, Daniel Gibney, Chirag Jain","doi":"10.1101/gr.280567.125","DOIUrl":"https://doi.org/10.1101/gr.280567.125","url":null,"abstract":"Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance. Recent works highlight the advantage of haplotype-resolved pangenome graphs in addressing these challenges. Building on these developments, we propose a rigorous alignment-free genotyping method. Our optimization framework identifies a path through the pangenome graph that maximizes the matches between the path and substrings of sequencing reads (e.g., k-mers) while minimizing recombination events (haplotype switches) along the path. We prove that this problem is NP-Hard and develop efficient integer-programming solutions. We benchmarked the algorithm using downsampled short-read datasets from homozygous human cell lines with coverage ranging from 0.1× to 10×. Our algorithm accurately estimates complete major histocompatibility complex (MHC) haplotype sequences with small edit distances from the ground-truth sequences, providing a significant advantage over existing methods on low-coverage inputs. While this algorithm is designed for haploid genomes, we discuss directions for extending it to diploid genotyping.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"50 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-quality assembly of the Chinese white truffle genome and recalibrated divergence time estimate provide insight into the evolutionary dynamics of Tuberaceae 中国白松露基因组的高质量组装和重新校准的分化时间估计提供了对结核科进化动力学的深入了解

IF 7 2区生物学

Genome research Pub Date : 2025-08-21 DOI: 10.1101/gr.280368.124

Jacopo Martelossi, Jacopo Vujovic, Yue Huang, Alessia Tatti, Kaiwei Xu, Federico Puliga, Yuanxue Chen, Omar Rota Stabelli, Fabrizio Ghiselli, Xiaoping Zhang, Alessandra Zambonelli

{"title":"High-quality assembly of the Chinese white truffle genome and recalibrated divergence time estimate provide insight into the evolutionary dynamics of Tuberaceae","authors":"Jacopo Martelossi, Jacopo Vujovic, Yue Huang, Alessia Tatti, Kaiwei Xu, Federico Puliga, Yuanxue Chen, Omar Rota Stabelli, Fabrizio Ghiselli, Xiaoping Zhang, Alessandra Zambonelli","doi":"10.1101/gr.280368.124","DOIUrl":"https://doi.org/10.1101/gr.280368.124","url":null,"abstract":"The genus Tuber (family: Tuberaceae) includes the most economically valuable ectomycorrhizal (ECM), truffle-forming fungi. Previous genomic analyses revealed that massive transposable element (TE) proliferation represents a convergent genomic feature of ECM fungi, including Tuberaceae. Repetitive sequences constitute a principal driver of genome evolution shaping its architecture and regulatory networks. In this context, Tuberaceae can become an important model system to study their genomic impact; however, the family lacks high-quality assemblies. Here, we investigate the interplay between TEs and Tuberaceae genome evolution by producing a highly contiguous assembly for the endangered Chinese white truffle Tuber panzhihuanense, along with a recalibrated timeline for Tuberaceae diversification and comprehensive comparative genomic analyses. We find that, concurrently with a Paleogene diversification of the family, pre-existing Chromoviridae-related Gypsy clades independently expanded in different truffle lineages, leading to increased genome size and high gene family turnover rates, but without resulting in highly rearranged genomes. Additionally, we uncover a significant enrichment of ECM-induced gene families stemming from ancestral duplication events. Finally, we explore the repetitive structure of nuclear ribosomal DNA (rDNA) loci for the first time in the clade. Most of the 45S rDNA paralogues are undergoing concerted evolution, though an isolated divergent locus raises concerns about potential issues for metabarcoding and biodiversity assessments. Our study establishes a fundamental genomic resource for future research on truffle genomics and showcases a clear example of how establishment and self-perpetuating expansion of heterochromatin can drive massive genome size variation due to activity of selfish genetic elements.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"9 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ultra-long sequencing for contiguous haplotype resolution of the human immunoglobulin heavy chain locus 人免疫球蛋白重链位点连续单倍型分离的超长测序

IF 7 2区生物学

Genome research Pub Date : 2025-08-21 DOI: 10.1101/gr.280400.125

Mari B Gornitzka, Egil Røsjø, Uddalok Jana, Easton E Ford, Alan Tourancheau, William Lees, Zachary Vanwinkle, Melissa L Smith, Corey T Watson, Andreas Lossius

{"title":"Ultra-long sequencing for contiguous haplotype resolution of the human immunoglobulin heavy chain locus","authors":"Mari B Gornitzka, Egil Røsjø, Uddalok Jana, Easton E Ford, Alan Tourancheau, William Lees, Zachary Vanwinkle, Melissa L Smith, Corey T Watson, Andreas Lossius","doi":"10.1101/gr.280400.125","DOIUrl":"https://doi.org/10.1101/gr.280400.125","url":null,"abstract":"Genetic diversity within the human immunoglobulin heavy chain (IGH) locus influences the expressed antibody repertoire and susceptibility to infectious and autoimmune diseases. However, repetitive sequences and complex structural variation pose significant challenges for large-scale characterization. Here, we introduce a method that combines Oxford Nanopore Technologies ultra-long sequencing and adaptive sampling with a bioinformatic pipeline to produce haplotype-resolved, annotated IGH assemblies. Notably, our strategy overcomes prior limitations in phasing resolution, enabling single-contig haplotype assemblies that span the entire IGH locus. We apply this method to four individuals and validate the accuracy of the IGH assemblies using Pacific Biosciences HiFi reads, demonstrating near-complete sequence congruence, with only some residual indel errors. Moreover, when applying our pipeline to the reference material HG002, it reveals no base differences and a limited number of indels compared with the Telomere-to-Telomere genome benchmark across the IGH region. Importantly, in the four individuals, our approach uncovers 28 novel alleles and previously uncharacterized large structural variants, including a 120 kb duplication spanning IGHE to IGHA1 within the IGH constant region (IGHC) and, within the IGHV region, an expanded seven-copy IGHV3-23 gene haplotype. These findings underscore the power of our method to resolve the full complexity of the IGH locus and uncover previously unrecognized variants that may affect immune function and disease susceptibility. Thus, our method provides a strong basis for future immunological research and translational applications.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"8 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling the functional fate of duplicated genes through expression profiling and structural analysis 通过表达谱和结构分析揭示重复基因的功能命运

IF 7 2区生物学

Genome research Pub Date : 2025-08-14 DOI: 10.1101/gr.280166.124

Alex Warwick Vesztrocy, Natasha Glover, Paul D. Thomas, Christophe Dessimoz, Irene Julca

{"title":"Unveiling the functional fate of duplicated genes through expression profiling and structural analysis","authors":"Alex Warwick Vesztrocy, Natasha Glover, Paul D. Thomas, Christophe Dessimoz, Irene Julca","doi":"10.1101/gr.280166.124","DOIUrl":"https://doi.org/10.1101/gr.280166.124","url":null,"abstract":"Gene duplication is a major evolutionary source of functional innovation. Following duplication events, gene copies (paralogues) may undergo various fates, including retention with functional modifications (such as subfunctionalization or neofunctionalization) or loss. When paralogues are retained, this results in complex orthology relationships, including one-to-many or many-to-many. In such cases, determining which one-to-one pair is more likely to have conserved functions can be challenging. It has been proposed that, following gene duplication, the copy that diverges more slowly in sequence is more likely to maintain the ancestral function -referred to here as \"the least diverged orthologue (LDO) conjecture\". This study explores this conjecture, using a novel method to identify asymmetric evolution of paralogues and apply it to all gene families across the Tree of Life in the PANTHER database. Structural data for over 1 million proteins and expression data for 16 animals and 20 plants were then used to investigate functional divergence following duplication. This analysis, the most comprehensive to date, revealed that whilst the majority of paralogues display similar rates of sequence evolution, significant differences in branch lengths following gene duplication can be correlated with functional divergence. Overall, the results support the least diverged orthologue conjecture, suggesting that the least diverged orthologue (LDO) tends to retain the ancestral function, whilst the most diverged orthologue (MDO) may acquire a new, potentially specialized, role.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"749 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0