Franjo Ivankovic , Arthur Ko , Jose Soto , Morgan Aster , Ricky Magner , Kate Balaconis , Beth Sheets , Lee Lichtenstein , Benjamin Neale , Chris Kachulis , Brian L. Browning
{"title":"下一代精神病学遗传学的归算:一个新颖的515k不同的参考小组和服务","authors":"Franjo Ivankovic , Arthur Ko , Jose Soto , Morgan Aster , Ricky Magner , Kate Balaconis , Beth Sheets , Lee Lichtenstein , Benjamin Neale , Chris Kachulis , Brian L. Browning","doi":"10.1016/j.euroneuro.2025.08.490","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate genotype imputation and cross-dataset harmonization of genomic data are essential for identifying the complex genetic underpinnings of mental health disorders. One of the most significant obstacles to the expanding diversity in genetic studies today is the lack of adequate reference data for individuals of non-European ancestries. While the recruitment of participants from diverse populations has improved, the availability of genomic tools to analyze their data is still lacking. Large electronic health records and biobanks, such as the NIH initiatives All of Us (AoU) and Analysis, Visualization, and Informatics Lab-space (AnVIL) have aggregated hundreds thousands of individuals from diverse ancestries.</div><div>Here, we present a novel reference panel built from over 515,579 individuals from AoU and AnVIL data. In addition to being the largest reference panel to date, this panel prioritizes ancestral variability, encompassing 261,163 samples from non-European ancestries–a representation nearly twice the size of the entire TOPMed reference panel. Specifically, this panel includes 101,982 individuals with African, 90,553 admixed/Latin American, 13,226 East Asian, 9,710 South Asian, 1,065 Middle Eastern/North African, and 44,627 other non-European ancestries.</div><div>The high coverage (30x) whole genome sequencing data in 414,830 AoU and 100,749 AnVIL samples were harmonized and QCd using Hail. A total of 665,398,839 high-quality variants from autosomes were exported as VCF files, removing the variants with: number alternate alleles greater than 31, average sum of allele depths (AD) less than 12 (proxy for depth; DP), singletons, variants with excessive heterozygosity, mean genotype quality (GQ) under 30, and call rate under 0.9. Exported VCFs were subsequently phased using Beagle 5.5 and shuffled with RESHAPE for increased security of the sensitive data.</div><div>This resource will significantly improve the accuracy of genotype imputation, particularly for rare variants and underrepresented populations, empowering novel discoveries in psychiatric genetics. The reference panel will be available for imputing genotype array (using Beagle) and low-pass sequencing (using Glimpse) data through Broad Institute of MIT and Harvard. The service is set to become available mid-2025, and will be first accessible as a command-line tool with a forthcoming web-based user interface.</div></div>","PeriodicalId":12049,"journal":{"name":"European Neuropsychopharmacology","volume":"99 ","pages":"Pages 18-19"},"PeriodicalIF":6.7000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NEXT-GENERATION IMPUTATION FOR PSYCHIATRIC GENETICS: A NOVEL 515K DIVERSE REFERENCE PANEL AND SERVICE\",\"authors\":\"Franjo Ivankovic , Arthur Ko , Jose Soto , Morgan Aster , Ricky Magner , Kate Balaconis , Beth Sheets , Lee Lichtenstein , Benjamin Neale , Chris Kachulis , Brian L. Browning\",\"doi\":\"10.1016/j.euroneuro.2025.08.490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate genotype imputation and cross-dataset harmonization of genomic data are essential for identifying the complex genetic underpinnings of mental health disorders. One of the most significant obstacles to the expanding diversity in genetic studies today is the lack of adequate reference data for individuals of non-European ancestries. While the recruitment of participants from diverse populations has improved, the availability of genomic tools to analyze their data is still lacking. Large electronic health records and biobanks, such as the NIH initiatives All of Us (AoU) and Analysis, Visualization, and Informatics Lab-space (AnVIL) have aggregated hundreds thousands of individuals from diverse ancestries.</div><div>Here, we present a novel reference panel built from over 515,579 individuals from AoU and AnVIL data. In addition to being the largest reference panel to date, this panel prioritizes ancestral variability, encompassing 261,163 samples from non-European ancestries–a representation nearly twice the size of the entire TOPMed reference panel. Specifically, this panel includes 101,982 individuals with African, 90,553 admixed/Latin American, 13,226 East Asian, 9,710 South Asian, 1,065 Middle Eastern/North African, and 44,627 other non-European ancestries.</div><div>The high coverage (30x) whole genome sequencing data in 414,830 AoU and 100,749 AnVIL samples were harmonized and QCd using Hail. A total of 665,398,839 high-quality variants from autosomes were exported as VCF files, removing the variants with: number alternate alleles greater than 31, average sum of allele depths (AD) less than 12 (proxy for depth; DP), singletons, variants with excessive heterozygosity, mean genotype quality (GQ) under 30, and call rate under 0.9. Exported VCFs were subsequently phased using Beagle 5.5 and shuffled with RESHAPE for increased security of the sensitive data.</div><div>This resource will significantly improve the accuracy of genotype imputation, particularly for rare variants and underrepresented populations, empowering novel discoveries in psychiatric genetics. The reference panel will be available for imputing genotype array (using Beagle) and low-pass sequencing (using Glimpse) data through Broad Institute of MIT and Harvard. The service is set to become available mid-2025, and will be first accessible as a command-line tool with a forthcoming web-based user interface.</div></div>\",\"PeriodicalId\":12049,\"journal\":{\"name\":\"European Neuropsychopharmacology\",\"volume\":\"99 \",\"pages\":\"Pages 18-19\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Neuropsychopharmacology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924977X25006480\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Neuropsychopharmacology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924977X25006480","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
NEXT-GENERATION IMPUTATION FOR PSYCHIATRIC GENETICS: A NOVEL 515K DIVERSE REFERENCE PANEL AND SERVICE
Accurate genotype imputation and cross-dataset harmonization of genomic data are essential for identifying the complex genetic underpinnings of mental health disorders. One of the most significant obstacles to the expanding diversity in genetic studies today is the lack of adequate reference data for individuals of non-European ancestries. While the recruitment of participants from diverse populations has improved, the availability of genomic tools to analyze their data is still lacking. Large electronic health records and biobanks, such as the NIH initiatives All of Us (AoU) and Analysis, Visualization, and Informatics Lab-space (AnVIL) have aggregated hundreds thousands of individuals from diverse ancestries.
Here, we present a novel reference panel built from over 515,579 individuals from AoU and AnVIL data. In addition to being the largest reference panel to date, this panel prioritizes ancestral variability, encompassing 261,163 samples from non-European ancestries–a representation nearly twice the size of the entire TOPMed reference panel. Specifically, this panel includes 101,982 individuals with African, 90,553 admixed/Latin American, 13,226 East Asian, 9,710 South Asian, 1,065 Middle Eastern/North African, and 44,627 other non-European ancestries.
The high coverage (30x) whole genome sequencing data in 414,830 AoU and 100,749 AnVIL samples were harmonized and QCd using Hail. A total of 665,398,839 high-quality variants from autosomes were exported as VCF files, removing the variants with: number alternate alleles greater than 31, average sum of allele depths (AD) less than 12 (proxy for depth; DP), singletons, variants with excessive heterozygosity, mean genotype quality (GQ) under 30, and call rate under 0.9. Exported VCFs were subsequently phased using Beagle 5.5 and shuffled with RESHAPE for increased security of the sensitive data.
This resource will significantly improve the accuracy of genotype imputation, particularly for rare variants and underrepresented populations, empowering novel discoveries in psychiatric genetics. The reference panel will be available for imputing genotype array (using Beagle) and low-pass sequencing (using Glimpse) data through Broad Institute of MIT and Harvard. The service is set to become available mid-2025, and will be first accessible as a command-line tool with a forthcoming web-based user interface.
期刊介绍:
European Neuropsychopharmacology is the official publication of the European College of Neuropsychopharmacology (ECNP). In accordance with the mission of the College, the journal focuses on clinical and basic science contributions that advance our understanding of brain function and human behaviour and enable translation into improved treatments and enhanced public health impact in psychiatry. Recent years have been characterized by exciting advances in basic knowledge and available experimental techniques in neuroscience and genomics. However, clinical translation of these findings has not been as rapid. The journal aims to narrow this gap by promoting findings that are expected to have a major impact on both our understanding of the biological bases of mental disorders and the development and improvement of treatments, ideally paving the way for prevention and recovery.