Sara Becelaere, Yasmina Abakkouy, Deborah Sarah Jans, Margaux David, Séverine Vermeire, Isabelle Cleynen
{"title":"STREAM-PRS:一个多工具流水线,用于简化多基因风险评分计算。","authors":"Sara Becelaere, Yasmina Abakkouy, Deborah Sarah Jans, Margaux David, Séverine Vermeire, Isabelle Cleynen","doi":"10.1186/s13073-025-01539-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Polygenic risk scores (PRS) offer an elegant approach to estimating an individual's genetic predisposition to a given disease or trait. Numerous tools are available for PRS calculation, each applying different strategies to account for linkage disequilibrium and effect size shrinkage. No single tool is inherently superior. Therefore, multiple tools should be tested to identify the one that best suits the research question. Additionally, challenges such as population stratification and PRS portability further complicate the field. Here, we developed STREAM-PRS, a PRS pipeline designed to calculate scores using five popular tools: PRSice-2, PRS-CS, LDpred2, lassosum, and lassosum2.</p><p><strong>Methods: </strong>STREAM-PRS first computes scores under various settings in a training dataset. The selected variants are subsequently used for score calculation in the test dataset, followed by PC correction and standardization to improve portability across different centers. Finally, the pipeline determines the best PRS tool and settings based on the variance explained (R<sup>2</sup>) in the test dataset. To demonstrate this PRS pipeline, we applied it to an in-house inflammatory bowel disease (IBD) cohort consisting of 3192 IBD cases and 822 controls. In total, 472 scores were created using The 1000 Genomes non-Finnish European subpopulation as training data and applied to UK Biobank data as the test dataset.</p><p><strong>Results: </strong>Using STREAM-PRS for 472 scores across the 5 PRS tools with 404 individuals in the training and 1000 individuals in the test dataset takes approximately 20 h to complete. For IBD, lassosum was identified as the best-performing tool with optimal settings as follows: a shrinkage value of 0.7 and a lambda value of 0.008859. Applying this optimized PRS to our in-house IBD dataset (validation) resulted in an R<sup>²</sup> of 0.203 and an AUC of 0.75. Further, the PRS showed a high positive predictive value of 0.905 but a low negative predictive value of 0.341. This suggests that the PRS is effective in identifying individuals at high risk but might be less reliable in excluding lower risk individuals.</p><p><strong>Conclusions: </strong>Overall, STREAM-PRS provides an efficient framework for selecting the best PRS calculation strategy and helps bridge the portability gap within the PRS field. STREAM-PRS is available at https://github.com/SaraBecelaere/STREAM-PRS.</p>","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"17 1","pages":"119"},"PeriodicalIF":10.4000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512491/pdf/","citationCount":"0","resultStr":"{\"title\":\"STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.\",\"authors\":\"Sara Becelaere, Yasmina Abakkouy, Deborah Sarah Jans, Margaux David, Séverine Vermeire, Isabelle Cleynen\",\"doi\":\"10.1186/s13073-025-01539-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Polygenic risk scores (PRS) offer an elegant approach to estimating an individual's genetic predisposition to a given disease or trait. Numerous tools are available for PRS calculation, each applying different strategies to account for linkage disequilibrium and effect size shrinkage. No single tool is inherently superior. Therefore, multiple tools should be tested to identify the one that best suits the research question. Additionally, challenges such as population stratification and PRS portability further complicate the field. Here, we developed STREAM-PRS, a PRS pipeline designed to calculate scores using five popular tools: PRSice-2, PRS-CS, LDpred2, lassosum, and lassosum2.</p><p><strong>Methods: </strong>STREAM-PRS first computes scores under various settings in a training dataset. The selected variants are subsequently used for score calculation in the test dataset, followed by PC correction and standardization to improve portability across different centers. Finally, the pipeline determines the best PRS tool and settings based on the variance explained (R<sup>2</sup>) in the test dataset. To demonstrate this PRS pipeline, we applied it to an in-house inflammatory bowel disease (IBD) cohort consisting of 3192 IBD cases and 822 controls. In total, 472 scores were created using The 1000 Genomes non-Finnish European subpopulation as training data and applied to UK Biobank data as the test dataset.</p><p><strong>Results: </strong>Using STREAM-PRS for 472 scores across the 5 PRS tools with 404 individuals in the training and 1000 individuals in the test dataset takes approximately 20 h to complete. For IBD, lassosum was identified as the best-performing tool with optimal settings as follows: a shrinkage value of 0.7 and a lambda value of 0.008859. Applying this optimized PRS to our in-house IBD dataset (validation) resulted in an R<sup>²</sup> of 0.203 and an AUC of 0.75. Further, the PRS showed a high positive predictive value of 0.905 but a low negative predictive value of 0.341. This suggests that the PRS is effective in identifying individuals at high risk but might be less reliable in excluding lower risk individuals.</p><p><strong>Conclusions: </strong>Overall, STREAM-PRS provides an efficient framework for selecting the best PRS calculation strategy and helps bridge the portability gap within the PRS field. STREAM-PRS is available at https://github.com/SaraBecelaere/STREAM-PRS.</p>\",\"PeriodicalId\":12645,\"journal\":{\"name\":\"Genome Medicine\",\"volume\":\"17 1\",\"pages\":\"119\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512491/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Medicine\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13073-025-01539-0\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-025-01539-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.
Background: Polygenic risk scores (PRS) offer an elegant approach to estimating an individual's genetic predisposition to a given disease or trait. Numerous tools are available for PRS calculation, each applying different strategies to account for linkage disequilibrium and effect size shrinkage. No single tool is inherently superior. Therefore, multiple tools should be tested to identify the one that best suits the research question. Additionally, challenges such as population stratification and PRS portability further complicate the field. Here, we developed STREAM-PRS, a PRS pipeline designed to calculate scores using five popular tools: PRSice-2, PRS-CS, LDpred2, lassosum, and lassosum2.
Methods: STREAM-PRS first computes scores under various settings in a training dataset. The selected variants are subsequently used for score calculation in the test dataset, followed by PC correction and standardization to improve portability across different centers. Finally, the pipeline determines the best PRS tool and settings based on the variance explained (R2) in the test dataset. To demonstrate this PRS pipeline, we applied it to an in-house inflammatory bowel disease (IBD) cohort consisting of 3192 IBD cases and 822 controls. In total, 472 scores were created using The 1000 Genomes non-Finnish European subpopulation as training data and applied to UK Biobank data as the test dataset.
Results: Using STREAM-PRS for 472 scores across the 5 PRS tools with 404 individuals in the training and 1000 individuals in the test dataset takes approximately 20 h to complete. For IBD, lassosum was identified as the best-performing tool with optimal settings as follows: a shrinkage value of 0.7 and a lambda value of 0.008859. Applying this optimized PRS to our in-house IBD dataset (validation) resulted in an R² of 0.203 and an AUC of 0.75. Further, the PRS showed a high positive predictive value of 0.905 but a low negative predictive value of 0.341. This suggests that the PRS is effective in identifying individuals at high risk but might be less reliable in excluding lower risk individuals.
Conclusions: Overall, STREAM-PRS provides an efficient framework for selecting the best PRS calculation strategy and helps bridge the portability gap within the PRS field. STREAM-PRS is available at https://github.com/SaraBecelaere/STREAM-PRS.
期刊介绍:
Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.