人免疫球蛋白重链位点连续单倍型分离的超长测序

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research Pub Date : 2025-08-21 DOI:10.1101/gr.280400.125

Mari B Gornitzka, Egil Røsjø, Uddalok Jana, Easton E Ford, Alan Tourancheau, William Lees, Zachary Vanwinkle, Melissa L Smith, Corey T Watson, Andreas Lossius

{"title":"人免疫球蛋白重链位点连续单倍型分离的超长测序","authors":"Mari B Gornitzka, Egil Røsjø, Uddalok Jana, Easton E Ford, Alan Tourancheau, William Lees, Zachary Vanwinkle, Melissa L Smith, Corey T Watson, Andreas Lossius","doi":"10.1101/gr.280400.125","DOIUrl":null,"url":null,"abstract":"Genetic diversity within the human immunoglobulin heavy chain (IGH) locus influences the expressed antibody repertoire and susceptibility to infectious and autoimmune diseases. However, repetitive sequences and complex structural variation pose significant challenges for large-scale characterization. Here, we introduce a method that combines Oxford Nanopore Technologies ultra-long sequencing and adaptive sampling with a bioinformatic pipeline to produce haplotype-resolved, annotated IGH assemblies. Notably, our strategy overcomes prior limitations in phasing resolution, enabling single-contig haplotype assemblies that span the entire IGH locus. We apply this method to four individuals and validate the accuracy of the IGH assemblies using Pacific Biosciences HiFi reads, demonstrating near-complete sequence congruence, with only some residual indel errors. Moreover, when applying our pipeline to the reference material HG002, it reveals no base differences and a limited number of indels compared with the Telomere-to-Telomere genome benchmark across the IGH region. Importantly, in the four individuals, our approach uncovers 28 novel alleles and previously uncharacterized large structural variants, including a 120 kb duplication spanning IGHE to IGHA1 within the IGH constant region (IGHC) and, within the IGHV region, an expanded seven-copy IGHV3-23 gene haplotype. These findings underscore the power of our method to resolve the full complexity of the IGH locus and uncover previously unrecognized variants that may affect immune function and disease susceptibility. Thus, our method provides a strong basis for future immunological research and translational applications.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"8 1","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ultra-long sequencing for contiguous haplotype resolution of the human immunoglobulin heavy chain locus\",\"authors\":\"Mari B Gornitzka, Egil Røsjø, Uddalok Jana, Easton E Ford, Alan Tourancheau, William Lees, Zachary Vanwinkle, Melissa L Smith, Corey T Watson, Andreas Lossius\",\"doi\":\"10.1101/gr.280400.125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genetic diversity within the human immunoglobulin heavy chain (IGH) locus influences the expressed antibody repertoire and susceptibility to infectious and autoimmune diseases. However, repetitive sequences and complex structural variation pose significant challenges for large-scale characterization. Here, we introduce a method that combines Oxford Nanopore Technologies ultra-long sequencing and adaptive sampling with a bioinformatic pipeline to produce haplotype-resolved, annotated IGH assemblies. Notably, our strategy overcomes prior limitations in phasing resolution, enabling single-contig haplotype assemblies that span the entire IGH locus. We apply this method to four individuals and validate the accuracy of the IGH assemblies using Pacific Biosciences HiFi reads, demonstrating near-complete sequence congruence, with only some residual indel errors. Moreover, when applying our pipeline to the reference material HG002, it reveals no base differences and a limited number of indels compared with the Telomere-to-Telomere genome benchmark across the IGH region. Importantly, in the four individuals, our approach uncovers 28 novel alleles and previously uncharacterized large structural variants, including a 120 kb duplication spanning IGHE to IGHA1 within the IGH constant region (IGHC) and, within the IGHV region, an expanded seven-copy IGHV3-23 gene haplotype. These findings underscore the power of our method to resolve the full complexity of the IGH locus and uncover previously unrecognized variants that may affect immune function and disease susceptibility. Thus, our method provides a strong basis for future immunological research and translational applications.\",\"PeriodicalId\":12678,\"journal\":{\"name\":\"Genome research\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1101/gr.280400.125\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.280400.125","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

人类免疫球蛋白重链（IGH）基因座的遗传多样性影响抗体库的表达以及对感染性和自身免疫性疾病的易感性。然而，重复序列和复杂的结构变化对大规模表征构成了重大挑战。在这里，我们介绍了一种将Oxford Nanopore Technologies的超长测序和自适应采样与生物信息学管道相结合的方法，以产生单倍型解析，注释的IGH组装。值得注意的是，我们的策略克服了先前在分相分辨率方面的限制，实现了跨越整个IGH位点的单片段单倍型组装。我们将这种方法应用于4个个体，并使用Pacific Biosciences HiFi reads验证了IGH序列的准确性，结果显示序列几乎完全一致，只有一些残留的indel误差。此外，当将我们的管道应用于参考物质HG002时，与IGH区域的端粒到端粒基因组基准相比，它显示没有碱基差异，并且索引数量有限。重要的是，在这4个个体中，我们的方法发现了28个新的等位基因和以前未被表征的大结构变异，包括IGH恒定区（IGHC）内跨越IGHE到IGHA1的120 kb重复，以及在IGHV区域内扩展的7拷贝IGHV3-23基因单倍型。这些发现强调了我们的方法在解决IGH位点的全部复杂性和揭示以前未被识别的可能影响免疫功能和疾病易感性的变异方面的力量。因此，我们的方法为未来的免疫学研究和转化应用提供了坚实的基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ultra-long sequencing for contiguous haplotype resolution of the human immunoglobulin heavy chain locus

Genetic diversity within the human immunoglobulin heavy chain (IGH) locus influences the expressed antibody repertoire and susceptibility to infectious and autoimmune diseases. However, repetitive sequences and complex structural variation pose significant challenges for large-scale characterization. Here, we introduce a method that combines Oxford Nanopore Technologies ultra-long sequencing and adaptive sampling with a bioinformatic pipeline to produce haplotype-resolved, annotated IGH assemblies. Notably, our strategy overcomes prior limitations in phasing resolution, enabling single-contig haplotype assemblies that span the entire IGH locus. We apply this method to four individuals and validate the accuracy of the IGH assemblies using Pacific Biosciences HiFi reads, demonstrating near-complete sequence congruence, with only some residual indel errors. Moreover, when applying our pipeline to the reference material HG002, it reveals no base differences and a limited number of indels compared with the Telomere-to-Telomere genome benchmark across the IGH region. Importantly, in the four individuals, our approach uncovers 28 novel alleles and previously uncharacterized large structural variants, including a 120 kb duplication spanning IGHE to IGHA1 within the IGH constant region (IGHC) and, within the IGHV region, an expanded seven-copy IGHV3-23 gene haplotype. These findings underscore the power of our method to resolve the full complexity of the IGH locus and uncover previously unrecognized variants that may affect immune function and disease susceptibility. Thus, our method provides a strong basis for future immunological research and translational applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genome research 生物-生化与分子生物学

CiteScore

12.40

自引率

1.40%

发文量

140

审稿时长

6 months

期刊介绍： Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.