Genome researchPub Date : 2025-04-11DOI: 10.1101/gr.278836.123
Nathan Hammond, Linda Liao, Pun Wai Tong, Zena Ng, Thuy-Mi P. Nguyen, Chandler Ho, Yao Yang, Stuart A. Scott
{"title":"Analytical validation of germline small variant detection using long-read HiFi genome sequencing","authors":"Nathan Hammond, Linda Liao, Pun Wai Tong, Zena Ng, Thuy-Mi P. Nguyen, Chandler Ho, Yao Yang, Stuart A. Scott","doi":"10.1101/gr.278836.123","DOIUrl":"https://doi.org/10.1101/gr.278836.123","url":null,"abstract":"Long-read sequencing has the capacity to interrogate difficult genomic regions and phase variants; however, short-read sequencing is more commonly implemented for clinical testing. Given the advances in long-read HiFi sequencing chemistry and variant calling, we analytically validated this technology for small variant detection (single nucleotide variants, insertions/deletions; SNVs/indels; <50bp). HiFi genome sequencing was performed on DNA from reference materials and clinical specimen types, and accuracy results were compared to short-read genome sequencing data. HiFi genome sequencing recall and precision across Genome in a Bottle (GIAB)-defined nondifficult and difficult genomic regions (high confidence) for SNVs were >99.9% and >99.7%, respectively, and for indels were >99.8% and >99.1%, respectively. Moreover, HiFi genome sequencing outperformed short-read genome sequencing on overall SNV/indel F1-score accuracy at all paired sequencing depths, which were further stratified across 100 total GIAB-defined genomic regions for a comprehensive evaluation of performance. Of note, HiFi genome sequencing F1-scores for SNVs and indels surpassed 99% at ~15×. and ~25×, respectively. In addition, high confidence small variant concordance across all HiFi genome sequencing reproducibility assessments (two specimens, three independent sequencing datasets) were >99.8% for SNVs and >98.6% for indels, and average high confidence small variant concordance between paired blood, saliva, and swab specimens were all >99.8%. Taken together, these data underscore that long-read HiFi genome sequencing detection of SNVs and indels is very accurate and robust, which supports the implementation of this technology for clinical diagnostic testing.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"37 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143822649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-04-11DOI: 10.1101/gr.280359.124
Rita Tam, Mareike Möller, Runpeng Luo, Zhenyan Luo, Ashley Jones, Sambasivam Periyannan, John P Rathjen, Benjamin Schwessinger
{"title":"Long-read genomics reveal extensive nuclear-specific evolution and allele-specific expression in a dikaryotic fungus","authors":"Rita Tam, Mareike Möller, Runpeng Luo, Zhenyan Luo, Ashley Jones, Sambasivam Periyannan, John P Rathjen, Benjamin Schwessinger","doi":"10.1101/gr.280359.124","DOIUrl":"https://doi.org/10.1101/gr.280359.124","url":null,"abstract":"Phased telomere-to-telomere (T2T) genome assemblies are revolutionizing our understanding of long-hidden genome biology dark matter such as centromeres, rDNA repeats, inter-haplotype variation, and allele-specific expression (ASE). Yet insights into dikaryotic fungi that separate their haploid genomes into distinct nuclei are limited. Here we explore the impact of dikaryotism on the genome biology of a long-term asexual clone of the wheat pathogenic fungus <em>Puccinia striiformis f. sp. tritici</em>. We use Oxford Nanopore Technologies (ONT) duplex sequencing combined with Hi-C to generate a T2T nuclear-phased assembly with >99.999% consensus accuracy. We show that this fungus has large regional centromeres enriched in LTR retrotransposons, with a single centromeric dip in methylation that suggests one kinetochore attachment site per chromosome. The centromeres of homologous chromosomes are most often highly diverse in sequence and kinetochore attachment sites are not always positionally conserved. Each nucleus carries a unique array of rDNAs with >200 copies that harbour nucleus-specific sequence variations. The inter-haplotype diversity between the two nuclear genomes is shaped by large-scale structural variations linked to transposable elements. ONT long-read cDNA analysis across dormancy and distinct host infection conditions revealed pervasive ASE for nearly 20% of the heterozygous genes. Genes encoding secreted proteins, including putative virulence effectors, are significantly enriched in ASE genes which appear to be linked to elevated CpG gene body methylation of the lower-expressed allele. This suggests that epigenetically regulated ASE is likely a previously overlooked mechanism facilitating plant infection. Overall, our study reveals how dikaryotism uniquely shapes key eukaryotic genome features.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"89 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143822650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-04-10DOI: 10.1101/gr.279674.124
Fu Xiang Quah, Miguel Vasconcelos Almeida, Moritz Blumer, Chengwei Ulrika Yuan, Bettina Fischer, Kirsten See, Ben Jackson, Richard Zatha, Bosco Rusuwa, George F. Turner, M. Emília Santos, Hannes Svardal, Martin Hemberg, Richard Durbin, Eric Miska
{"title":"Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements","authors":"Fu Xiang Quah, Miguel Vasconcelos Almeida, Moritz Blumer, Chengwei Ulrika Yuan, Bettina Fischer, Kirsten See, Ben Jackson, Richard Zatha, Bosco Rusuwa, George F. Turner, M. Emília Santos, Hannes Svardal, Martin Hemberg, Richard Durbin, Eric Miska","doi":"10.1101/gr.279674.124","DOIUrl":"https://doi.org/10.1101/gr.279674.124","url":null,"abstract":"Pangenome methods have the potential to uncover hitherto undiscovered sequences missing from established reference genomes, making them useful to study evolutionary and speciation processes in diverse organisms. The cichlid fishes of the East African Rift Lakes represent one of nature's most phenotypically diverse vertebrate radiations, but single-nucleotide polymorphism (SNP)–based studies have revealed little sequence difference, with 0.1%–0.25% pairwise divergence between Lake Malawi species. These were based on aligning short reads to a single linear reference genome and ignored the contribution of larger-scale structural variants (SVs). We constructed a pangenome graph that integrates six new and two existing long-read genome assemblies of Lake Malawi haplochromine cichlids. This graph intuitively represents complex and nested variation between the genomes and reveals that the SV landscape is dominated by large insertions, many exclusive to individual assemblies. The graph incorporates a substantial amount of extra sequence across seven species, the total size of which is 33.1% longer than that of a single cichlid genome. Approximately 4.73% to 9.86% of the assembly lengths are estimated as interspecies structural variation between cichlids, suggesting substantial genomic diversity underappreciated in SNP studies. Although coding regions remain highly conserved, our analysis uncovers a significant proportion of SV sequences as transposable element (TE) insertions, especially DNA, LINE, and LTR TEs. These findings underscore that the cichlid genome is shaped both by small-nucleotide mutations and large, TE-derived sequence alterations, both of which merit study to understand their interplay in cichlid evolution.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-04-10DOI: 10.1101/gr.279458.124
Chengbo Fu, Einari A. Niskanen, Gong-Hong Wei, Zhirong Yang, Marta Sanvicente-García, Marc Güell, Lu Cheng
{"title":"k-mer manifold approximation and projection for visualizing DNA sequences","authors":"Chengbo Fu, Einari A. Niskanen, Gong-Hong Wei, Zhirong Yang, Marta Sanvicente-García, Marc Güell, Lu Cheng","doi":"10.1101/gr.279458.124","DOIUrl":"https://doi.org/10.1101/gr.279458.124","url":null,"abstract":"Identifying and illustrating patterns in DNA sequences are crucial tasks in various biological data analyses. In this task, patterns are often represented by sets of <em>k</em>-mers, the fundamental building blocks of DNA sequences. To visually unveil these patterns, one could project each <em>k</em>-mer onto a point in two-dimensional (2D) space. However, this projection poses challenges owing to the high-dimensional nature of <em>k</em>-mers and their unique mathematical properties. Here, we establish a mathematical system to address the peculiarities of the <em>k</em>-mer manifold. Leveraging this <em>k</em>-mer manifold theory, we develop a statistical method named KMAP for detecting <em>k</em>-mer patterns and visualizing them in 2D space. We applied KMAP to three distinct data sets to showcase its utility. KMAP achieves a comparable performance to the classical method MEME, with ∼90% similarity in motif discovery from HT-SELEX data. In the analysis of H3K27ac ChIP-seq data from Ewing sarcoma (EWS), we find that BACH1, OTX2, and KNCH2 might affect EWS prognosis by binding to promoter and enhancer regions across the genome. We also observe potential colocalization of BACH1, OTX2, and the motif CCCAGGCTGGAGTGC in ∼70 bp windows in the enhancer regions. Furthermore, we find that FLI1 binds to the enhancer regions after ETV6 degradation, indicating competitive binding between ETV6 and FLI1. Moreover, KMAP identifies four prevalent patterns in gene editing data of the AAVS1 locus, aligning with findings reported in the literature. These applications underscore that KMAP can be a valuable tool across various biological contexts.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"34 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-04-10DOI: 10.1101/gr.279543.124
Byron J. Smith, Chunyu Zhao, Veronika Dubinkina, Xiaofan Jin, Liron Zahavi, Saar Shoer, Jacqueline Moltzau-Anderson, Eran Segal, Katherine S. Pollard
{"title":"Accurate estimation of intraspecific microbial gene content variation in metagenomic data with MIDAS v3 and StrainPGC","authors":"Byron J. Smith, Chunyu Zhao, Veronika Dubinkina, Xiaofan Jin, Liron Zahavi, Saar Shoer, Jacqueline Moltzau-Anderson, Eran Segal, Katherine S. Pollard","doi":"10.1101/gr.279543.124","DOIUrl":"https://doi.org/10.1101/gr.279543.124","url":null,"abstract":"Metagenomics has greatly expanded our understanding of the human gut microbiome by revealing a vast diversity of bacterial species within and across individuals. Even within a single species, different strains can have highly divergent gene content, affecting traits such as antibiotic resistance, metabolism, and virulence. Methods that harness metagenomic data to resolve strain-level differences in functional potential are crucial for understanding the causes and consequences of this intraspecific diversity. The enormous size of pangenome references, strain mixing within samples, and inconsistent sequencing depth present challenges for existing tools that analyze samples one at a time. To address this gap, we updated the MIDAS pangenome profiler, now released as version 3, and developed StrainPGC, an approach to strain-specific gene content estimation that combines strain tracking and correlations across multiple samples. We validate our integrated analysis using a complex synthetic community of strains from the human gut and find that StrainPGC outperforms existing approaches. Analyzing a large, publicly available metagenome collection from inflammatory bowel disease patients and healthy controls, we catalog the functional repertoires of thousands of strains across hundreds of species, capturing extensive diversity missing from reference databases. Finally, we apply StrainPGC to metagenomes from a clinical trial of fecal microbiota transplantation for the treatment of ulcerative colitis. We identify two <em>Escherichia coli</em> strains, from two different donors, that are both frequently transmitted to patients but have notable differences in functional potential. StrainPGC and MIDAS v3 together enable precise, intraspecific pangenomic investigations using large collections of metagenomic data without microbial isolation or de novo assembly.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"4 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-04-10DOI: 10.1101/gr.279365.124
Jun Kim, Haoyu Wang, Sevinç Ercan
{"title":"Cohesin organizes 3D DNA contacts surrounding active enhancers in C. elegans","authors":"Jun Kim, Haoyu Wang, Sevinç Ercan","doi":"10.1101/gr.279365.124","DOIUrl":"https://doi.org/10.1101/gr.279365.124","url":null,"abstract":"In mammals, cohesin and CTCF organize the 3D genome into topologically associating domains (TADs) to regulate communication between <em>cis</em>-regulatory elements. Many organisms, including <em>S. cerevisiae</em>, <em>C. elegans</em>, and <em>A. thaliana</em> contain cohesin but lack CTCF. Here, we used <em>C. elegans</em> to investigate the function of cohesin in 3D genome organization in the absence of CTCF. Using Hi-C data, we observe cohesin-dependent features called “fountains,” which have also been reported in zebrafish and mice. These are population average reflections of DNA loops originating from distinct genomic regions and are ∼20–40 kb in <em>C. elegans</em>. Hi-C analysis upon cohesin and WAPL-1 depletion supports the idea that cohesin is preferentially loaded at sites bound by the <em>C. elegans</em> ortholog of NIPBL and loop extrudes in an effectively two-sided manner. ChIP-seq analyses show that cohesin translocation along the fountain trajectory depends on a fully intact complex and is extended upon WAPL-1 depletion. Hi-C contact patterns at individual fountains suggest that cohesin processivity is unequal on each side, possibly owing to collision with cohesin loaded from surrounding sites. The putative cohesin loading sites are closest to active enhancers, and fountain strength is associated with transcription. Compared with mammals, the average processivity of <em>C. elegans</em> cohesin is about 10-fold shorter, and the binding of NIPBL ortholog does not depend on cohesin. We propose that preferential loading and loop extrusion by cohesin is an evolutionarily conserved mechanism that regulates the 3D interactions of enhancers in animal genomes.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"9 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TFcomb identifies transcription factor combinations for cellular reprogramming based on single-cell multiomics data","authors":"Chen Li, Sijie Chen, Yixin Chen, Haiyang Bian, Minsheng Hao, Lei Wei, Xuegong Zhang","doi":"10.1101/gr.279955.124","DOIUrl":"https://doi.org/10.1101/gr.279955.124","url":null,"abstract":"Reprogramming cell state transitions provides the potential for cell engineering and regenerative therapy for many diseases. Finding the reprogramming transcription factors (TFs) and their combinations that can direct the desired state transition is crucial for the task. Computational methods have been developed to identify such reprogramming TFs. However, most of them can only generate a ranked list of individual TFs and ignore the identification of TF combinations. Even for individual reprogramming TF identification, current methods often fail to put the real effective reprogramming TFs at the top of their rankings. To address these challenges, we developed TFcomb, a computational method that leverages single-cell multiomics data to identify reprogramming TFs and TF combinations that can direct cell state transitions. We modeled the task of finding reprogramming TFs and their combinations as an inverse problem to enable searching for answers in very high dimensional space, and used Tikhonov regularization to guarantee the generalization ability of solutions. For the coefficient matrix of the model, we designed a graph attention network to augment gene regulatory networks built with single-cell RNA-seq and ATAC-seq data. Benchmarking experiments on data of human embryonic stem cells demonstrated superior performance of TFcomb against existing methods for identifying individual TFs. We curated datasets of multiple cell reprogramming cases and demonstrated that TFcomb can efficiently identify reprogramming TF combinations from a vast pool of potential combinations. We applied TFcomb on a dataset of mouse hair follicle development and found key TFs in cell differentiation. All experiments showed that TFcomb is powerful in identifying reprogramming TFs and TF combinations from single-cell datasets to empower future cell engineering.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"74 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-03-31DOI: 10.1101/gr.279428.124
Yu-Chi Chen, David LJ Vendrami, Maximilian L Huber, Luisa EY Handel, Christopher R Cooney, Joseph Ivan Hoffman, Toni I Gossmann
{"title":"Diverse evolutionary trajectories of mitocoding DNA in mammalian and avian nuclear genomes","authors":"Yu-Chi Chen, David LJ Vendrami, Maximilian L Huber, Luisa EY Handel, Christopher R Cooney, Joseph Ivan Hoffman, Toni I Gossmann","doi":"10.1101/gr.279428.124","DOIUrl":"https://doi.org/10.1101/gr.279428.124","url":null,"abstract":"Sporadically genetic material that originates from an organelle genome integrates into the nuclear genome. However it is unclear what processes maintain such integrations over evolutionary time. Recently it was shown that nuclear DNA of mitochondrial origin (NUMTs) may harbour genes with intact mitochondrial reading frames despite the fact that they are highly divergent from the host's mitochondrial genome. Two major hypotheses have been put forward to explain the existence of such mitocoding nuclear genes: (i) recent introgression from another species and (ii) long-term selection. To investigate whether these intriguing possibilities play a role we scanned the genomes of more than 1,000 avian and mammalian species for NUMTs. We show that the subclass of divergent NUMTs harbouring mitogenes with intact reading frames are widespread across mammals and birds. We show that some of these NUMTs appear to have similarity across species. We also demonstrate that many mitochondrial-coding NUMTs exhibit signs of long-term selection. In a subset of these NUMT genes, we detected evolutionary signals consistent with adaptive evolution, including one human NUMT shared among seven ape species. These findings suggest that NUMT insertions may occasionally be functional.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"58 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143736572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-03-26DOI: 10.1101/gr.279414.124
Wouter Steyaert, Lydia Sagath, German Demidov, Vicente A. Yépez, Anna Esteve-Codina, Julien Gagneur, Kornelia Ellwanger, Ronny Derks, Marjan Weiss, Amber den Ouden, Simone van den Heuvel, Hilde Swinkels, Nick Zomer, Marloes Steehouwer, Luke O'Gorman, Galuh Astuti, Kornelia Neveling, Rebecca Schüle, Jishu Xu, Matthis Synofzik, Danique Beijer, Holger Hengel, Ludger Schöls, Kristl G. Claeys, Jonathan Baets, Liedewei Van de Vondel, Alessandra Ferlini, Rita Selvatici, Heba Morsy, Marwa Saeed Abd Elmaksoud, Volker Straub, Juliane Müller, Veronica Pini, Luke Perry, Anna Sarkozy, Irina Zaharieva, Francesco Muntoni, Enrico Bugiardini, Kiran Polavarapu, Rita Horvath, Evan Reid, Hanns Lochmüller, Marco Spinazzi, Marco Savarese, Solve-RD DITF-ITHACA, Solve-RD DITF-Euro-NMD, Solve-RD DITF-RND, Solve-RD DITF-EpiCARE, Leslie Matalonga, Steven Laurie, Han G. Brunner, Holm Graessner, Sergi Beltran, Stephan Ossowski, Lisenka E.L.M. Vissers, Christian Gilissen, Alexander Hoischen, on behalf of the Solve-RD consortium
{"title":"Unraveling undiagnosed rare disease cases by HiFi long-read genome sequencing","authors":"Wouter Steyaert, Lydia Sagath, German Demidov, Vicente A. Yépez, Anna Esteve-Codina, Julien Gagneur, Kornelia Ellwanger, Ronny Derks, Marjan Weiss, Amber den Ouden, Simone van den Heuvel, Hilde Swinkels, Nick Zomer, Marloes Steehouwer, Luke O'Gorman, Galuh Astuti, Kornelia Neveling, Rebecca Schüle, Jishu Xu, Matthis Synofzik, Danique Beijer, Holger Hengel, Ludger Schöls, Kristl G. Claeys, Jonathan Baets, Liedewei Van de Vondel, Alessandra Ferlini, Rita Selvatici, Heba Morsy, Marwa Saeed Abd Elmaksoud, Volker Straub, Juliane Müller, Veronica Pini, Luke Perry, Anna Sarkozy, Irina Zaharieva, Francesco Muntoni, Enrico Bugiardini, Kiran Polavarapu, Rita Horvath, Evan Reid, Hanns Lochmüller, Marco Spinazzi, Marco Savarese, Solve-RD DITF-ITHACA, Solve-RD DITF-Euro-NMD, Solve-RD DITF-RND, Solve-RD DITF-EpiCARE, Leslie Matalonga, Steven Laurie, Han G. Brunner, Holm Graessner, Sergi Beltran, Stephan Ossowski, Lisenka E.L.M. Vissers, Christian Gilissen, Alexander Hoischen, on behalf of the Solve-RD consortium","doi":"10.1101/gr.279414.124","DOIUrl":"https://doi.org/10.1101/gr.279414.124","url":null,"abstract":"Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilized 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural variants (SVs), single-nucleotide variants (SNVs), insertion-deletions (indels), and short tandem repeat (STR) expansions in previously studied RD families without a clear molecular diagnosis. Our cohort includes 293 individuals from 114 genetically undiagnosed RD families selected by European Reference Network (ERN) experts. Of these, 21 families were affected by so-called “unsolvable” syndromes for which genetic causes remain unknown and for which prior testing was not a prerequisite. The remaining 93 families had at least one individual affected by a rare neurological, neuromuscular, or epilepsy disorder without a genetic diagnosis despite extensive prior testing. Clinical interpretation and orthogonal validation of variants in known disease genes yielded 12 novel genetic diagnoses due to de novo and rare inherited SNVs, indels, SVs, and STR expansions. In an additional five families, we identified a candidate disease-causing variant, including an <em>MCF2</em>/<em>FGF13</em> fusion and a <em>PSMA3</em> deletion. However, no common genetic cause was identified in any of the “unsolvable” syndromes. Taken together, we found (likely) disease-causing genetic variants in 11.8% of previously unsolved families and additional candidate disease-causing SVs in another 5.4% of these families. In conclusion, our results demonstrate the potential added value of HiFi long-read genome sequencing in undiagnosed rare diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"35 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome researchPub Date : 2025-03-24DOI: 10.1101/gr.279943.124
Inswasti Cahyani, John Tyson, Nadine Holmes, Joshua Quick, Chris Moore, Nicholas James Loman, Matt Loose
{"title":"An optimized toolkit for high molecular weight DNA extraction and ultra-long-read nanopore sequencing using glass beads and hexamminecobalt(III) chloride","authors":"Inswasti Cahyani, John Tyson, Nadine Holmes, Joshua Quick, Chris Moore, Nicholas James Loman, Matt Loose","doi":"10.1101/gr.279943.124","DOIUrl":"https://doi.org/10.1101/gr.279943.124","url":null,"abstract":"Since the advent of long- read sequencing, achieving longer read lengths has been a key goal for many users. Ultra-long read sets (N50 > 100 kb) produced from Oxford Nanopore sequencers have improved genome assemblies in recent years. However, despite progress in extraction protocols and library preparation methods, ultra-long sequencing remains challenging for many sample types. Here we compare various methods and introduce the FindingNemo protocol that: (1) optimizes ultra-high molecular weight (UHMW) DNA extraction and library clean-up by using glass beads and Hexamminecobalt(III) chloride (CoHex), (2) can deliver high ultra-long sequencing yield of >20 Gb of reads from a single MinION flow cell or >100 Gb from PromethION devices (R9.4 to R10.4 pore variants), and (3) is scalable to using fewer input cells or lower DNA amounts, with extraction to sequencing possible in a single working day. By comparison, we demonstrate that this protocol surpasses previous methods by enabling precise determination of input DNA quantity and quality through cell counting, sample dilution, and homogenization techniques.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"10 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143695663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}