AoUPRS: A Cost-Effective and Versatile PRS Calculator for the All of Us Program

bioRxiv Pub Date : 2024-07-16 DOI:10.1101/2024.07.11.603165
Ahmed Khattab, Shang-Fu Chen, Nathan Wineinger, Ali Torkamani
{"title":"AoUPRS: A Cost-Effective and Versatile PRS Calculator for the All of Us Program","authors":"Ahmed Khattab, Shang-Fu Chen, Nathan Wineinger, Ali Torkamani","doi":"10.1101/2024.07.11.603165","DOIUrl":null,"url":null,"abstract":"Background The All of Us (AoU) Research Program provides a comprehensive genomic dataset to accelerate health research and medical breakthroughs. Despite its potential, researchers face significant challenges, including high costs and inefficiencies associated with data extraction and analysis. AoUPRS addresses these challenges by offering a versatile and cost-effective tool for calculating polygenic risk scores (PRS), enabling both experienced and novice researchers to leverage the AoU dataset for significant genomic discoveries. Results AoUPRS is implemented in Python and utilizes the Hail framework for genomic data analysis. It offers two distinct approaches for PRS calculation: the Hail MatrixTable (MT) and the Hail Variant Dataset (VDS). The MT approach provides a dense representation of genotype data, while the VDS approach offers a sparse representation, significantly reducing computational costs. In performance evaluations, the VDS approach demonstrated a cost reduction of up to 99.51% for smaller scores and 85% for larger scores compared to the MT approach. Both approaches yielded similar predictive power, as shown by logistic regression analyses of PRS for coronary artery disease, atrial fibrillation, and type 2 diabetes. The empirical cumulative distribution functions (ECDFs) for PRS values further confirmed the consistency between the two methods. Conclusions AoUPRS is a versatile and cost-effective tool that addresses the high costs and inefficiencies associated with PRS calculations using the AoU dataset. By offering both dense and sparse data processing approaches, AoUPRS allows researchers to choose the approach best suited to their needs, facilitating genomic discoveries. The tool’s open-source availability on GitHub, coupled with detailed documentation and tutorials, ensures accessibility and ease of use for the scientific community.","PeriodicalId":9124,"journal":{"name":"bioRxiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.11.603165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background The All of Us (AoU) Research Program provides a comprehensive genomic dataset to accelerate health research and medical breakthroughs. Despite its potential, researchers face significant challenges, including high costs and inefficiencies associated with data extraction and analysis. AoUPRS addresses these challenges by offering a versatile and cost-effective tool for calculating polygenic risk scores (PRS), enabling both experienced and novice researchers to leverage the AoU dataset for significant genomic discoveries. Results AoUPRS is implemented in Python and utilizes the Hail framework for genomic data analysis. It offers two distinct approaches for PRS calculation: the Hail MatrixTable (MT) and the Hail Variant Dataset (VDS). The MT approach provides a dense representation of genotype data, while the VDS approach offers a sparse representation, significantly reducing computational costs. In performance evaluations, the VDS approach demonstrated a cost reduction of up to 99.51% for smaller scores and 85% for larger scores compared to the MT approach. Both approaches yielded similar predictive power, as shown by logistic regression analyses of PRS for coronary artery disease, atrial fibrillation, and type 2 diabetes. The empirical cumulative distribution functions (ECDFs) for PRS values further confirmed the consistency between the two methods. Conclusions AoUPRS is a versatile and cost-effective tool that addresses the high costs and inefficiencies associated with PRS calculations using the AoU dataset. By offering both dense and sparse data processing approaches, AoUPRS allows researchers to choose the approach best suited to their needs, facilitating genomic discoveries. The tool’s open-source availability on GitHub, coupled with detailed documentation and tutorials, ensures accessibility and ease of use for the scientific community.
AoUPRS:适用于 "全民计划 "的成本效益型多功能 PRS 计算器
背景 我们所有人(AoU)研究计划提供了一个全面的基因组数据集,以加速健康研究和医学突破。尽管该数据集潜力巨大,但研究人员仍面临着巨大的挑战,其中包括与数据提取和分析相关的高成本和低效率。AoUPRS 提供了一种计算多基因风险评分 (PRS) 的多功能、高成本效益的工具,使经验丰富的研究人员和新手都能利用 AoU 数据集获得重大基因组发现,从而应对了这些挑战。成果 AoUPRS 是用 Python 实现的,利用 Hail 框架进行基因组数据分析。它提供了两种不同的 PRS 计算方法:Hail MatrixTable(MT)和 Hail Variant Dataset(VDS)。MT 方法提供了基因型数据的密集表示,而 VDS 方法提供了稀疏表示,大大降低了计算成本。在性能评估中,与 MT 方法相比,VDS 方法在较小的分数上降低了高达 99.51% 的成本,在较大的分数上降低了 85% 的成本。冠心病、心房颤动和 2 型糖尿病 PRS 的逻辑回归分析表明,两种方法都具有相似的预测能力。PRS 值的经验累积分布函数 (ECDF) 进一步证实了两种方法的一致性。结论 AoUPRS 是一种多功能且具有成本效益的工具,它解决了使用 AoU 数据集计算 PRS 所带来的高成本和低效率问题。通过提供密集和稀疏两种数据处理方法,AoUPRS 允许研究人员选择最适合其需求的方法,从而促进基因组发现。该工具在 GitHub 上开源,并配有详细的文档和教程,确保了科学界的可访问性和易用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信