通过结构变异的机器学习扩展生物库药物基因组学。

IF 3.3 3区 生物学 Q2 GENETICS & HEREDITY
Genetics Pub Date : 2025-05-09 DOI:10.1093/genetics/iyaf088
Brett Vanderwerff, Amy L Pasternak, Lars G Fritsche, Emily Bertucci-Richter, Snehal Patil, Michael Boehnke, Xiang Zhou, Sebastian Zöllner, Daniel L Hertz, Matthew Zawistowski
{"title":"通过结构变异的机器学习扩展生物库药物基因组学。","authors":"Brett Vanderwerff, Amy L Pasternak, Lars G Fritsche, Emily Bertucci-Richter, Snehal Patil, Michael Boehnke, Xiang Zhou, Sebastian Zöllner, Daniel L Hertz, Matthew Zawistowski","doi":"10.1093/genetics/iyaf088","DOIUrl":null,"url":null,"abstract":"<p><p>Biobanks linking genetic data with clinical health records provide exciting opportunities for pharmacogenomic (PGx) research on genetic variation and drug response. Designed as central and multi-use resources, biobanks can facilitate diverse PGx research efforts, including the study of drug efficacy and adverse effects. Specialized PGx alleles and phenotypes are critical for such studies and can be conveniently called from existing array-based genotypes routinely collected in most biobanks. We describe a central callset of PGx alleles and phenotypes in over 80,000 participants of the Michigan Genomics Initiative (MGI) biobank, created using the PyPGx software on TOPMed imputed genotypes. The array-based PGx allele calls demonstrate concordance (>92%) with a set of PCR-validated alleles collected during clinical care, but do not identify PGx alleles dependent on structural variation, including the clinically important CYP2D6*5 deletion. To address this, we developed a support vector machine trained on genotype array SNV probe intensities to classify CYP2D6*5 carriers. This method had >99% accuracy and reclassified ∼7% of African American and ∼4% of White MGI participants to lower activity metabolizer phenotypes, predicting higher risks of adverse drug reactions. We demonstrate that central PGx callsets created with existing tools and genetic data can be augmented by customized calls for challenging alleles based on structural variants to broaden the research potential and clinical utility of biobanks. These PGx callsets can be created in biobanks with existing array-based genotype data and highlight the utility of advanced computational methods in PGx allele identification.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Expanding biobank pharmacogenomics through machine learning calls of structural variation.\",\"authors\":\"Brett Vanderwerff, Amy L Pasternak, Lars G Fritsche, Emily Bertucci-Richter, Snehal Patil, Michael Boehnke, Xiang Zhou, Sebastian Zöllner, Daniel L Hertz, Matthew Zawistowski\",\"doi\":\"10.1093/genetics/iyaf088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Biobanks linking genetic data with clinical health records provide exciting opportunities for pharmacogenomic (PGx) research on genetic variation and drug response. Designed as central and multi-use resources, biobanks can facilitate diverse PGx research efforts, including the study of drug efficacy and adverse effects. Specialized PGx alleles and phenotypes are critical for such studies and can be conveniently called from existing array-based genotypes routinely collected in most biobanks. We describe a central callset of PGx alleles and phenotypes in over 80,000 participants of the Michigan Genomics Initiative (MGI) biobank, created using the PyPGx software on TOPMed imputed genotypes. The array-based PGx allele calls demonstrate concordance (>92%) with a set of PCR-validated alleles collected during clinical care, but do not identify PGx alleles dependent on structural variation, including the clinically important CYP2D6*5 deletion. To address this, we developed a support vector machine trained on genotype array SNV probe intensities to classify CYP2D6*5 carriers. This method had >99% accuracy and reclassified ∼7% of African American and ∼4% of White MGI participants to lower activity metabolizer phenotypes, predicting higher risks of adverse drug reactions. We demonstrate that central PGx callsets created with existing tools and genetic data can be augmented by customized calls for challenging alleles based on structural variants to broaden the research potential and clinical utility of biobanks. These PGx callsets can be created in biobanks with existing array-based genotype data and highlight the utility of advanced computational methods in PGx allele identification.</p>\",\"PeriodicalId\":48925,\"journal\":{\"name\":\"Genetics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/genetics/iyaf088\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyaf088","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

生物库将遗传数据与临床健康记录联系起来,为遗传变异和药物反应的药物基因组学研究提供了令人兴奋的机会。作为中心和多用途资源,生物库可以促进多种PGx研究工作,包括药物疗效和不良反应的研究。专门的PGx等位基因和表型对于此类研究至关重要,并且可以方便地从大多数生物库中常规收集的现有基于阵列的基因型中调用。我们描述了密歇根基因组计划(MGI)生物库中超过80,000名参与者的PGx等位基因和表型的中心呼叫集,该呼叫集使用TOPMed输入基因型上的PyPGx软件创建。基于阵列的PGx等位基因呼叫与临床护理期间收集的一组pcr验证的等位基因显示一致性(>92%),但未识别依赖于结构变异的PGx等位基因,包括临床上重要的CYP2D6*5缺失。为了解决这个问题,我们开发了一个基于SNV探针强度训练的支持向量机来对CYP2D6*5携带者进行分类。该方法具有bbb99 %的准确性,并将~ 7%的非裔美国人和~ 4%的白人MGI参与者重新分类为活性代谢表型较低的人,预测药物不良反应的风险较高。我们证明,利用现有工具和遗传数据创建的中心PGx呼叫集可以通过基于结构变异的挑战性等位基因的定制呼叫来增强,以扩大生物库的研究潜力和临床实用性。这些PGx呼叫集可以用现有的基于阵列的基因型数据在生物银行中创建,并突出了先进的计算方法在PGx等位基因鉴定中的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Expanding biobank pharmacogenomics through machine learning calls of structural variation.

Biobanks linking genetic data with clinical health records provide exciting opportunities for pharmacogenomic (PGx) research on genetic variation and drug response. Designed as central and multi-use resources, biobanks can facilitate diverse PGx research efforts, including the study of drug efficacy and adverse effects. Specialized PGx alleles and phenotypes are critical for such studies and can be conveniently called from existing array-based genotypes routinely collected in most biobanks. We describe a central callset of PGx alleles and phenotypes in over 80,000 participants of the Michigan Genomics Initiative (MGI) biobank, created using the PyPGx software on TOPMed imputed genotypes. The array-based PGx allele calls demonstrate concordance (>92%) with a set of PCR-validated alleles collected during clinical care, but do not identify PGx alleles dependent on structural variation, including the clinically important CYP2D6*5 deletion. To address this, we developed a support vector machine trained on genotype array SNV probe intensities to classify CYP2D6*5 carriers. This method had >99% accuracy and reclassified ∼7% of African American and ∼4% of White MGI participants to lower activity metabolizer phenotypes, predicting higher risks of adverse drug reactions. We demonstrate that central PGx callsets created with existing tools and genetic data can be augmented by customized calls for challenging alleles based on structural variants to broaden the research potential and clinical utility of biobanks. These PGx callsets can be created in biobanks with existing array-based genotype data and highlight the utility of advanced computational methods in PGx allele identification.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genetics
Genetics GENETICS & HEREDITY-
CiteScore
6.90
自引率
6.10%
发文量
177
审稿时长
1.5 months
期刊介绍: GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work. While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal. The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists. GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信