{"title":"使用机器学习预测血友病“A”抑制剂的发展:在CHAMP数据集上使用AI进行数据预处理,平衡和生物标志物鉴定的综合方法。","authors":"Vikalp Kumar Singh, Maheshwari Prasad Singh","doi":"10.2174/0113892010366485250415101928","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hemophilia 'A' (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.</p><p><strong>Objective: </strong>This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia 'A,' addressing the challenges associated with data imbalance and enhancing prediction accuracy.</p><p><strong>Methods: </strong>The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine- learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.</p><p><strong>Results: </strong>The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.</p><p><strong>Conclusion: </strong>This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia 'A' patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.</p>","PeriodicalId":10881,"journal":{"name":"Current pharmaceutical biotechnology","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Inhibitor Development in Hemophilia 'A' using Machine Learning: A Comprehensive Approach to Data Preprocessing, Balancing, and Biomarker Identification Using AI on the CHAMP Dataset.\",\"authors\":\"Vikalp Kumar Singh, Maheshwari Prasad Singh\",\"doi\":\"10.2174/0113892010366485250415101928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Hemophilia 'A' (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.</p><p><strong>Objective: </strong>This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia 'A,' addressing the challenges associated with data imbalance and enhancing prediction accuracy.</p><p><strong>Methods: </strong>The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine- learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.</p><p><strong>Results: </strong>The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.</p><p><strong>Conclusion: </strong>This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia 'A' patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.</p>\",\"PeriodicalId\":10881,\"journal\":{\"name\":\"Current pharmaceutical biotechnology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current pharmaceutical biotechnology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2174/0113892010366485250415101928\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current pharmaceutical biotechnology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2174/0113892010366485250415101928","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Predicting Inhibitor Development in Hemophilia 'A' using Machine Learning: A Comprehensive Approach to Data Preprocessing, Balancing, and Biomarker Identification Using AI on the CHAMP Dataset.
Background: Hemophilia 'A' (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.
Objective: This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia 'A,' addressing the challenges associated with data imbalance and enhancing prediction accuracy.
Methods: The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine- learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.
Results: The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.
Conclusion: This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia 'A' patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.
期刊介绍:
Current Pharmaceutical Biotechnology aims to cover all the latest and outstanding developments in Pharmaceutical Biotechnology. Each issue of the journal includes timely in-depth reviews, original research articles and letters written by leaders in the field, covering a range of current topics in scientific areas of Pharmaceutical Biotechnology. Invited and unsolicited review articles are welcome. The journal encourages contributions describing research at the interface of drug discovery and pharmacological applications, involving in vitro investigations and pre-clinical or clinical studies. Scientific areas within the scope of the journal include pharmaceutical chemistry, biochemistry and genetics, molecular and cellular biology, and polymer and materials sciences as they relate to pharmaceutical science and biotechnology. In addition, the journal also considers comprehensive studies and research advances pertaining food chemistry with pharmaceutical implication. Areas of interest include:
DNA/protein engineering and processing
Synthetic biotechnology
Omics (genomics, proteomics, metabolomics and systems biology)
Therapeutic biotechnology (gene therapy, peptide inhibitors, enzymes)
Drug delivery and targeting
Nanobiotechnology
Molecular pharmaceutics and molecular pharmacology
Analytical biotechnology (biosensing, advanced technology for detection of bioanalytes)
Pharmacokinetics and pharmacodynamics
Applied Microbiology
Bioinformatics (computational biopharmaceutics and modeling)
Environmental biotechnology
Regenerative medicine (stem cells, tissue engineering and biomaterials)
Translational immunology (cell therapies, antibody engineering, xenotransplantation)
Industrial bioprocesses for drug production and development
Biosafety
Biotech ethics
Special Issues devoted to crucial topics, providing the latest comprehensive information on cutting-edge areas of research and technological advances, are welcome.
Current Pharmaceutical Biotechnology is an essential journal for academic, clinical, government and pharmaceutical scientists who wish to be kept informed and up-to-date with the latest and most important developments.