Identification of influential rare variants in aggregate testing using random forest importance measures

IF 1 4区 生物学 Q4 GENETICS & HEREDITY
Rachel Z. Blumhagen, David A. Schwartz, Carl D. Langefeld, Tasha E. Fingerlin
{"title":"Identification of influential rare variants in aggregate testing using random forest importance measures","authors":"Rachel Z. Blumhagen,&nbsp;David A. Schwartz,&nbsp;Carl D. Langefeld,&nbsp;Tasha E. Fingerlin","doi":"10.1111/ahg.12509","DOIUrl":null,"url":null,"abstract":"<div>\n \n <section>\n \n \n <p>Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are “driving” the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] &lt; 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 &lt; MAF &lt; 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in <i>TERT</i> and <i>FAM13A</i>, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.</p>\n </section>\n </div>","PeriodicalId":8085,"journal":{"name":"Annals of Human Genetics","volume":"87 4","pages":"184-195"},"PeriodicalIF":1.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ahg.12509","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ahg.12509","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are “driving” the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.

Abstract Image

利用随机森林重要性测度识别聚集检验中有影响的罕见变异
与顺序测试每个单个变体相比,通常采用罕见变体的聚合测试来识别相关区域。当一个聚合测试是显著的,确定哪些罕见的变异是“驱动”的关联是有意义的。我们最近开发了罕见变异影响过滤工具(RIFT)来识别有影响的罕见变异,并表明RIFT与其他已发表的方法相比具有更高的真阳性率。在这里,我们使用来自标准随机森林(RF)和可变重要性加权RF (vi-RF)的重要性度量来识别有影响的变量。对于非常罕见的变异(次要等位基因频率[MAF] <0.001), vi-RF:Accuracy法的中位真阳性率最高(TPR = 0.24;四分位间距[IQR]: 0.13, 0.42),其次是RF:准确度法(TPR = 0.16;IQR: 0.07, 0.33),均优于RIFT (TPR = 0.05;Iqr: 0.02, 0.15)。在不常见的变异中(0.001 <加器& lt;0.03), RF方法的真阳性率高于RIFT方法,同时观察到相似的假阳性率。最后,我们将RF方法应用于特发性肺纤维化(IPF)的靶向重测序研究,其中vi-RF方法分别鉴定了TERT和FAM13A的8个和7个变体。总之,vi-RF提供了一种改进的、客观的方法,通过显著的聚合测试来识别有影响的变异。我们已经扩展了之前开发的R包RIFT,以包含随机森林方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Human Genetics
Annals of Human Genetics 生物-遗传学
CiteScore
4.20
自引率
0.00%
发文量
34
审稿时长
3 months
期刊介绍: Annals of Human Genetics publishes material directly concerned with human genetics or the application of scientific principles and techniques to any aspect of human inheritance. Papers that describe work on other species that may be relevant to human genetics will also be considered. Mathematical models should include examples of application to data where possible. Authors are welcome to submit Supporting Information, such as data sets or additional figures or tables, that will not be published in the print edition of the journal, but which will be viewable via the online edition and stored on the website.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信