使用机器学习预测血友病“A”抑制剂的发展:在CHAMP数据集上使用AI进行数据预处理,平衡和生物标志物鉴定的综合方法。

IF 2.2 4区 医学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Vikalp Kumar Singh, Maheshwari Prasad Singh
{"title":"使用机器学习预测血友病“A”抑制剂的发展:在CHAMP数据集上使用AI进行数据预处理,平衡和生物标志物鉴定的综合方法。","authors":"Vikalp Kumar Singh, Maheshwari Prasad Singh","doi":"10.2174/0113892010366485250415101928","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hemophilia 'A' (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.</p><p><strong>Objective: </strong>This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia 'A,' addressing the challenges associated with data imbalance and enhancing prediction accuracy.</p><p><strong>Methods: </strong>The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine- learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.</p><p><strong>Results: </strong>The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.</p><p><strong>Conclusion: </strong>This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia 'A' patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.</p>","PeriodicalId":10881,"journal":{"name":"Current pharmaceutical biotechnology","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Inhibitor Development in Hemophilia 'A' using Machine Learning: A Comprehensive Approach to Data Preprocessing, Balancing, and Biomarker Identification Using AI on the CHAMP Dataset.\",\"authors\":\"Vikalp Kumar Singh, Maheshwari Prasad Singh\",\"doi\":\"10.2174/0113892010366485250415101928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Hemophilia 'A' (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.</p><p><strong>Objective: </strong>This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia 'A,' addressing the challenges associated with data imbalance and enhancing prediction accuracy.</p><p><strong>Methods: </strong>The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine- learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.</p><p><strong>Results: </strong>The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.</p><p><strong>Conclusion: </strong>This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia 'A' patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.</p>\",\"PeriodicalId\":10881,\"journal\":{\"name\":\"Current pharmaceutical biotechnology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current pharmaceutical biotechnology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2174/0113892010366485250415101928\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current pharmaceutical biotechnology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2174/0113892010366485250415101928","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:血友病“A”(HA)是一种以因子VIII (FVIII)缺乏为特征的遗传性血液疾病,治疗通常会引发FVIII的中和抗体(抑制剂)的产生。预测这些抑制剂的发展对临床应用至关重要,但由于数据不平衡、数据偏斜和数据处理不足,存在重大的计算挑战。目的:本研究旨在开发一种机器学习/人工智能方法,以发现血友病a患者的生物标志物并预测因子VIII抑制剂的发展,解决与数据不平衡相关的挑战,提高预测准确性。方法:采用随机过采样(ROS)技术对CHAMP数据集中的数据不平衡问题进行分析和预测。使用了随机森林、XG Boost、Cat Boost、逻辑回归、梯度Boosting和Light GBM等几种机器学习分类模型。使用分层k-fold方法的GridSearchCV优化来调整超参数。模型的性能根据准确性、精密度、召回率和F1分数进行评估。使用可解释的人工智能(XAI)工具SHAP (SHapley Additive exPlanations)进一步分析随机森林模型,以确定影响模型性能的变量。结果:随机森林模型优于其他分类器,平均准确率达到97.37%,精度、召回率和F1分数也非常接近。XAI工具SHAP根据对模型预测的影响对变量临床严重程度、变异类型、外显子、HGVS cDNA、hg19坐标等进行排序。此外,该研究还确定了与FVIII抑制相关的生物标志物。结论:本研究在血友病a患者抑制剂发展的早期预测方面取得了突破,为个性化和有效的治疗方案铺平了道路。将预处理流水线、随机森林模型和SHAP分析相结合,为指导HA患者的治疗策略提供了一种新的解决方案,可以显著促进靶向有效治疗的开发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting Inhibitor Development in Hemophilia 'A' using Machine Learning: A Comprehensive Approach to Data Preprocessing, Balancing, and Biomarker Identification Using AI on the CHAMP Dataset.

Background: Hemophilia 'A' (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.

Objective: This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia 'A,' addressing the challenges associated with data imbalance and enhancing prediction accuracy.

Methods: The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine- learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.

Results: The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.

Conclusion: This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia 'A' patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Current pharmaceutical biotechnology
Current pharmaceutical biotechnology 医学-生化与分子生物学
CiteScore
5.60
自引率
3.60%
发文量
203
审稿时长
6 months
期刊介绍: Current Pharmaceutical Biotechnology aims to cover all the latest and outstanding developments in Pharmaceutical Biotechnology. Each issue of the journal includes timely in-depth reviews, original research articles and letters written by leaders in the field, covering a range of current topics in scientific areas of Pharmaceutical Biotechnology. Invited and unsolicited review articles are welcome. The journal encourages contributions describing research at the interface of drug discovery and pharmacological applications, involving in vitro investigations and pre-clinical or clinical studies. Scientific areas within the scope of the journal include pharmaceutical chemistry, biochemistry and genetics, molecular and cellular biology, and polymer and materials sciences as they relate to pharmaceutical science and biotechnology. In addition, the journal also considers comprehensive studies and research advances pertaining food chemistry with pharmaceutical implication. Areas of interest include: DNA/protein engineering and processing Synthetic biotechnology Omics (genomics, proteomics, metabolomics and systems biology) Therapeutic biotechnology (gene therapy, peptide inhibitors, enzymes) Drug delivery and targeting Nanobiotechnology Molecular pharmaceutics and molecular pharmacology Analytical biotechnology (biosensing, advanced technology for detection of bioanalytes) Pharmacokinetics and pharmacodynamics Applied Microbiology Bioinformatics (computational biopharmaceutics and modeling) Environmental biotechnology Regenerative medicine (stem cells, tissue engineering and biomaterials) Translational immunology (cell therapies, antibody engineering, xenotransplantation) Industrial bioprocesses for drug production and development Biosafety Biotech ethics Special Issues devoted to crucial topics, providing the latest comprehensive information on cutting-edge areas of research and technological advances, are welcome. Current Pharmaceutical Biotechnology is an essential journal for academic, clinical, government and pharmaceutical scientists who wish to be kept informed and up-to-date with the latest and most important developments.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信