使用机器学习和单核苷酸多态性改善绝经后妇女类风湿关节炎风险预测。

PLOS digital health Pub Date : 2025-04-09 eCollection Date: 2025-04-01 DOI:10.1371/journal.pdig.0000790
Yingke Xu, Qing Wu
{"title":"使用机器学习和单核苷酸多态性改善绝经后妇女类风湿关节炎风险预测。","authors":"Yingke Xu, Qing Wu","doi":"10.1371/journal.pdig.0000790","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic factors contribute to 60-70% of the variability in rheumatoid arthritis (RA). However, few studies have used genetic variants to predict RA risk. This study aimed to enhance RA risk prediction by leveraging single nucleotide polymorphisms (SNPs) through machine-learning algorithms, utilizing Women's Health Initiative data. We developed four predictive models: 1) based on common RA risk factors, 2) model 1 incorporating polygenic risk scores (PRS) with principal components, 3) model 1 and SNPs after feature reduction, and 4) model 1 and SNPs with kernel principal component analysis. Each model was assessed using logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGBoost), and support vector machine (SVM). Performance metrics included the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive and negative predictive values (PPV and NPV), and F1-score. The fourth model, integrating SNPs with XGBoost, outperformed all other models. In addition, the XGBoost model that combines genomic data with conventional phenotypic predictors significantly enhanced predictive accuracy, achieving the highest AUC of 0.90 and an F1 score of 0.83. The DeLong test confirmed significant differences in AUC between this model and the others (p-values < 0.0001), particularly highlighting its efficacy in utilizing complex genetic information. These findings emphasize the advantage of combining in-depth genomic data with advanced machine learning for RA risk prediction. The most robust performance of the XGBoost model, which integrated both conventional risk factors and individual SNPs, demonstrates its potential as a tool in personalized medicine for complex diseases like RA. This approach offers a more nuanced and effective RA risk assessment strategy, underscoring the need for further studies to extend broader applications.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 4","pages":"e0000790"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11981130/pdf/","citationCount":"0","resultStr":"{\"title\":\"Using machine learning and single nucleotide polymorphisms for improving rheumatoid arthritis risk Prediction in postmenopausal women.\",\"authors\":\"Yingke Xu, Qing Wu\",\"doi\":\"10.1371/journal.pdig.0000790\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Genetic factors contribute to 60-70% of the variability in rheumatoid arthritis (RA). However, few studies have used genetic variants to predict RA risk. This study aimed to enhance RA risk prediction by leveraging single nucleotide polymorphisms (SNPs) through machine-learning algorithms, utilizing Women's Health Initiative data. We developed four predictive models: 1) based on common RA risk factors, 2) model 1 incorporating polygenic risk scores (PRS) with principal components, 3) model 1 and SNPs after feature reduction, and 4) model 1 and SNPs with kernel principal component analysis. Each model was assessed using logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGBoost), and support vector machine (SVM). Performance metrics included the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive and negative predictive values (PPV and NPV), and F1-score. The fourth model, integrating SNPs with XGBoost, outperformed all other models. In addition, the XGBoost model that combines genomic data with conventional phenotypic predictors significantly enhanced predictive accuracy, achieving the highest AUC of 0.90 and an F1 score of 0.83. The DeLong test confirmed significant differences in AUC between this model and the others (p-values < 0.0001), particularly highlighting its efficacy in utilizing complex genetic information. These findings emphasize the advantage of combining in-depth genomic data with advanced machine learning for RA risk prediction. The most robust performance of the XGBoost model, which integrated both conventional risk factors and individual SNPs, demonstrates its potential as a tool in personalized medicine for complex diseases like RA. This approach offers a more nuanced and effective RA risk assessment strategy, underscoring the need for further studies to extend broader applications.</p>\",\"PeriodicalId\":74465,\"journal\":{\"name\":\"PLOS digital health\",\"volume\":\"4 4\",\"pages\":\"e0000790\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11981130/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLOS digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pdig.0000790\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

遗传因素占类风湿关节炎(RA)变异性的60-70%。然而,很少有研究使用遗传变异来预测类风湿性关节炎的风险。本研究旨在利用妇女健康倡议数据,通过机器学习算法利用单核苷酸多态性(snp)来增强RA风险预测。我们建立了4种预测模型:1)基于常见RA危险因素的预测模型,2)结合多基因风险评分(PRS)和主成分的预测模型1,3)特征约简后的模型1和SNPs预测模型,4)核主成分分析的模型1和SNPs预测模型。使用逻辑回归(LR)、随机森林(RF)、极端梯度增强(XGBoost)和支持向量机(SVM)对每个模型进行评估。性能指标包括受试者工作特征曲线下面积(AUC)、敏感性、特异性、阳性和阴性预测值(PPV和NPV)以及f1评分。第四个模型将snp与XGBoost集成在一起,其性能优于所有其他模型。此外,结合基因组数据和传统表型预测因子的XGBoost模型显著提高了预测精度,AUC最高为0.90,F1得分为0.83。DeLong检验证实了该模型与其他模型之间的AUC存在显著差异(p值< 0.0001),特别突出了其在利用复杂遗传信息方面的有效性。这些发现强调了将深入的基因组数据与先进的机器学习相结合用于RA风险预测的优势。XGBoost模型整合了传统的风险因素和个体snp,其最稳健的表现表明,它有潜力成为治疗类风湿关节炎等复杂疾病的个性化医疗工具。这种方法提供了一种更细致和有效的RA风险评估策略,强调了进一步研究以扩展更广泛应用的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using machine learning and single nucleotide polymorphisms for improving rheumatoid arthritis risk Prediction in postmenopausal women.

Genetic factors contribute to 60-70% of the variability in rheumatoid arthritis (RA). However, few studies have used genetic variants to predict RA risk. This study aimed to enhance RA risk prediction by leveraging single nucleotide polymorphisms (SNPs) through machine-learning algorithms, utilizing Women's Health Initiative data. We developed four predictive models: 1) based on common RA risk factors, 2) model 1 incorporating polygenic risk scores (PRS) with principal components, 3) model 1 and SNPs after feature reduction, and 4) model 1 and SNPs with kernel principal component analysis. Each model was assessed using logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGBoost), and support vector machine (SVM). Performance metrics included the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive and negative predictive values (PPV and NPV), and F1-score. The fourth model, integrating SNPs with XGBoost, outperformed all other models. In addition, the XGBoost model that combines genomic data with conventional phenotypic predictors significantly enhanced predictive accuracy, achieving the highest AUC of 0.90 and an F1 score of 0.83. The DeLong test confirmed significant differences in AUC between this model and the others (p-values < 0.0001), particularly highlighting its efficacy in utilizing complex genetic information. These findings emphasize the advantage of combining in-depth genomic data with advanced machine learning for RA risk prediction. The most robust performance of the XGBoost model, which integrated both conventional risk factors and individual SNPs, demonstrates its potential as a tool in personalized medicine for complex diseases like RA. This approach offers a more nuanced and effective RA risk assessment strategy, underscoring the need for further studies to extend broader applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信