下一代精神病学遗传学的归算：一个新颖的515k不同的参考小组和服务

IF 6.7 2区医学 Q1 CLINICAL NEUROLOGY

European Neuropsychopharmacology Pub Date : 2025-10-01 DOI:10.1016/j.euroneuro.2025.08.490

Franjo Ivankovic , Arthur Ko , Jose Soto , Morgan Aster , Ricky Magner , Kate Balaconis , Beth Sheets , Lee Lichtenstein , Benjamin Neale , Chris Kachulis , Brian L. Browning

{"title":"下一代精神病学遗传学的归算：一个新颖的515k不同的参考小组和服务","authors":"Franjo Ivankovic , Arthur Ko , Jose Soto , Morgan Aster , Ricky Magner , Kate Balaconis , Beth Sheets , Lee Lichtenstein , Benjamin Neale , Chris Kachulis , Brian L. Browning","doi":"10.1016/j.euroneuro.2025.08.490","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate genotype imputation and cross-dataset harmonization of genomic data are essential for identifying the complex genetic underpinnings of mental health disorders. One of the most significant obstacles to the expanding diversity in genetic studies today is the lack of adequate reference data for individuals of non-European ancestries. While the recruitment of participants from diverse populations has improved, the availability of genomic tools to analyze their data is still lacking. Large electronic health records and biobanks, such as the NIH initiatives All of Us (AoU) and Analysis, Visualization, and Informatics Lab-space (AnVIL) have aggregated hundreds thousands of individuals from diverse ancestries.</div><div>Here, we present a novel reference panel built from over 515,579 individuals from AoU and AnVIL data. In addition to being the largest reference panel to date, this panel prioritizes ancestral variability, encompassing 261,163 samples from non-European ancestries–a representation nearly twice the size of the entire TOPMed reference panel. Specifically, this panel includes 101,982 individuals with African, 90,553 admixed/Latin American, 13,226 East Asian, 9,710 South Asian, 1,065 Middle Eastern/North African, and 44,627 other non-European ancestries.</div><div>The high coverage (30x) whole genome sequencing data in 414,830 AoU and 100,749 AnVIL samples were harmonized and QCd using Hail. A total of 665,398,839 high-quality variants from autosomes were exported as VCF files, removing the variants with: number alternate alleles greater than 31, average sum of allele depths (AD) less than 12 (proxy for depth; DP), singletons, variants with excessive heterozygosity, mean genotype quality (GQ) under 30, and call rate under 0.9. Exported VCFs were subsequently phased using Beagle 5.5 and shuffled with RESHAPE for increased security of the sensitive data.</div><div>This resource will significantly improve the accuracy of genotype imputation, particularly for rare variants and underrepresented populations, empowering novel discoveries in psychiatric genetics. The reference panel will be available for imputing genotype array (using Beagle) and low-pass sequencing (using Glimpse) data through Broad Institute of MIT and Harvard. The service is set to become available mid-2025, and will be first accessible as a command-line tool with a forthcoming web-based user interface.</div></div>","PeriodicalId":12049,"journal":{"name":"European Neuropsychopharmacology","volume":"99 ","pages":"Pages 18-19"},"PeriodicalIF":6.7000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NEXT-GENERATION IMPUTATION FOR PSYCHIATRIC GENETICS: A NOVEL 515K DIVERSE REFERENCE PANEL AND SERVICE\",\"authors\":\"Franjo Ivankovic , Arthur Ko , Jose Soto , Morgan Aster , Ricky Magner , Kate Balaconis , Beth Sheets , Lee Lichtenstein , Benjamin Neale , Chris Kachulis , Brian L. Browning\",\"doi\":\"10.1016/j.euroneuro.2025.08.490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate genotype imputation and cross-dataset harmonization of genomic data are essential for identifying the complex genetic underpinnings of mental health disorders. One of the most significant obstacles to the expanding diversity in genetic studies today is the lack of adequate reference data for individuals of non-European ancestries. While the recruitment of participants from diverse populations has improved, the availability of genomic tools to analyze their data is still lacking. Large electronic health records and biobanks, such as the NIH initiatives All of Us (AoU) and Analysis, Visualization, and Informatics Lab-space (AnVIL) have aggregated hundreds thousands of individuals from diverse ancestries.</div><div>Here, we present a novel reference panel built from over 515,579 individuals from AoU and AnVIL data. In addition to being the largest reference panel to date, this panel prioritizes ancestral variability, encompassing 261,163 samples from non-European ancestries–a representation nearly twice the size of the entire TOPMed reference panel. Specifically, this panel includes 101,982 individuals with African, 90,553 admixed/Latin American, 13,226 East Asian, 9,710 South Asian, 1,065 Middle Eastern/North African, and 44,627 other non-European ancestries.</div><div>The high coverage (30x) whole genome sequencing data in 414,830 AoU and 100,749 AnVIL samples were harmonized and QCd using Hail. A total of 665,398,839 high-quality variants from autosomes were exported as VCF files, removing the variants with: number alternate alleles greater than 31, average sum of allele depths (AD) less than 12 (proxy for depth; DP), singletons, variants with excessive heterozygosity, mean genotype quality (GQ) under 30, and call rate under 0.9. Exported VCFs were subsequently phased using Beagle 5.5 and shuffled with RESHAPE for increased security of the sensitive data.</div><div>This resource will significantly improve the accuracy of genotype imputation, particularly for rare variants and underrepresented populations, empowering novel discoveries in psychiatric genetics. The reference panel will be available for imputing genotype array (using Beagle) and low-pass sequencing (using Glimpse) data through Broad Institute of MIT and Harvard. The service is set to become available mid-2025, and will be first accessible as a command-line tool with a forthcoming web-based user interface.</div></div>\",\"PeriodicalId\":12049,\"journal\":{\"name\":\"European Neuropsychopharmacology\",\"volume\":\"99 \",\"pages\":\"Pages 18-19\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Neuropsychopharmacology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924977X25006480\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Neuropsychopharmacology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924977X25006480","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

准确的基因型估算和基因组数据的跨数据集协调对于确定精神健康障碍的复杂遗传基础至关重要。今天扩大遗传研究多样性的最重要障碍之一是缺乏对非欧洲血统个体的充分参考数据。虽然从不同人群中招募参与者的情况有所改善，但用于分析其数据的基因组工具的可用性仍然缺乏。大型电子健康记录和生物银行，如美国国立卫生研究院倡议的“我们所有人”（AoU）和“分析、可视化和信息学实验室空间”（AnVIL），汇集了来自不同祖先的数十万人。在这里，我们提出了一个新的参考面板，从AoU和AnVIL数据中建立了超过515,579个个体。除了是迄今为止最大的参考小组外，该小组还优先考虑了祖先的可变性，包括261163个来自非欧洲祖先的样本，这几乎是整个TOPMed参考小组的两倍。具体来说，该小组包括101,982名非洲人，90,553名混合/拉丁美洲人，13,226名东亚人，9,710名南亚人，1,065名中东/北非人，以及44,627名其他非欧洲血统的人。利用Hail对414,830份AoU和100,749份AnVIL样品的高覆盖率（30倍）全基因组测序数据进行协调和QCd。共有665,398,839个常染色体高质量变异被导出为VCF文件，剔除了交替等位基因数大于31、平均等位基因深度（AD）小于12（代表深度；DP）、单例、杂合度过高、平均基因型质量（GQ）小于30、呼出率小于0.9的变异。输出的vcf随后使用Beagle 5.5分阶段进行，并使用重塑进行重组，以增加敏感数据的安全性。这一资源将显著提高基因型插入的准确性，特别是对于罕见变异和代表性不足的人群，赋予精神病学遗传学的新发现。参考面板将通过麻省理工学院和哈佛大学的Broad研究所进行基因型阵列（使用Beagle）和低通测序（使用Glimpse）数据的输入。该服务将于2025年中期投入使用，并将首先作为命令行工具与即将推出的基于web的用户界面进行访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NEXT-GENERATION IMPUTATION FOR PSYCHIATRIC GENETICS: A NOVEL 515K DIVERSE REFERENCE PANEL AND SERVICE

Accurate genotype imputation and cross-dataset harmonization of genomic data are essential for identifying the complex genetic underpinnings of mental health disorders. One of the most significant obstacles to the expanding diversity in genetic studies today is the lack of adequate reference data for individuals of non-European ancestries. While the recruitment of participants from diverse populations has improved, the availability of genomic tools to analyze their data is still lacking. Large electronic health records and biobanks, such as the NIH initiatives All of Us (AoU) and Analysis, Visualization, and Informatics Lab-space (AnVIL) have aggregated hundreds thousands of individuals from diverse ancestries.

Here, we present a novel reference panel built from over 515,579 individuals from AoU and AnVIL data. In addition to being the largest reference panel to date, this panel prioritizes ancestral variability, encompassing 261,163 samples from non-European ancestries–a representation nearly twice the size of the entire TOPMed reference panel. Specifically, this panel includes 101,982 individuals with African, 90,553 admixed/Latin American, 13,226 East Asian, 9,710 South Asian, 1,065 Middle Eastern/North African, and 44,627 other non-European ancestries.

The high coverage (30x) whole genome sequencing data in 414,830 AoU and 100,749 AnVIL samples were harmonized and QCd using Hail. A total of 665,398,839 high-quality variants from autosomes were exported as VCF files, removing the variants with: number alternate alleles greater than 31, average sum of allele depths (AD) less than 12 (proxy for depth; DP), singletons, variants with excessive heterozygosity, mean genotype quality (GQ) under 30, and call rate under 0.9. Exported VCFs were subsequently phased using Beagle 5.5 and shuffled with RESHAPE for increased security of the sensitive data.

This resource will significantly improve the accuracy of genotype imputation, particularly for rare variants and underrepresented populations, empowering novel discoveries in psychiatric genetics. The reference panel will be available for imputing genotype array (using Beagle) and low-pass sequencing (using Glimpse) data through Broad Institute of MIT and Harvard. The service is set to become available mid-2025, and will be first accessible as a command-line tool with a forthcoming web-based user interface.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Neuropsychopharmacology 医学-精神病学

CiteScore

10.30

自引率

5.40%

发文量

730

审稿时长

41 days

期刊介绍： European Neuropsychopharmacology is the official publication of the European College of Neuropsychopharmacology (ECNP). In accordance with the mission of the College, the journal focuses on clinical and basic science contributions that advance our understanding of brain function and human behaviour and enable translation into improved treatments and enhanced public health impact in psychiatry. Recent years have been characterized by exciting advances in basic knowledge and available experimental techniques in neuroscience and genomics. However, clinical translation of these findings has not been as rapid. The journal aims to narrow this gap by promoting findings that are expected to have a major impact on both our understanding of the biological bases of mental disorders and the development and improvement of treatments, ideally paving the way for prevention and recovery.