A Novel Model Using ML Techniques for Clinical Trial Design and Expedited Patient Onboarding Process.

IF 2.2 Q3 HEALTH CARE SCIENCES & SERVICES

ClinicoEconomics and Outcomes Research Pub Date : 2025-01-16 eCollection Date: 2025-01-01 DOI:10.2147/CEOR.S479603

Abhirvey Iyer, Sundaravalli Narayanaswami

{"title":"A Novel Model Using ML Techniques for Clinical Trial Design and Expedited Patient Onboarding Process.","authors":"Abhirvey Iyer, Sundaravalli Narayanaswami","doi":"10.2147/CEOR.S479603","DOIUrl":null,"url":null,"abstract":"Introduction: Clinical trials are critical for drug development and patient care; however, they often need more efficient trial design and patient enrolment processes. This research explores integrating machine learning (ML) techniques to address these challenges. Specifically, the study investigates ML models for two critical aspects: (1) streamlining clinical trial design parameters (like the site of drug action, type of Interventional/Observational model, etc) and (2) optimizing patient/volunteer enrolment for trials through efficient classification techniques.Methods: The study utilized two datasets: the first, with 55,000 samples (from ClinicalTrials.gov), was divided into five subsets (10,000-15,000 rows each) for model evaluation, focusing on trial parameter optimization. The second dataset targeted patient eligibility classification (from the UCI ML Repository). Five ML models-XGBoost, Random Forest, Support Vector Classifier (SVC), Logistic Regression, and Decision Tree-were applied to both datasets, alongside Artificial Neural Networks (ANN) for the second dataset. Model performance was evaluated using precision, recall, balanced accuracy, ROC-AUC, and weighted F1-score, with results averaged across k-fold cross-validation.Results: In the first phase, XGBoost and Random Forest emerged as the best-performing models across all five subsets, achieving an average balanced accuracy of 0.71 and an average ROC-AUC of 0.7. The second dataset analysis revealed that while SVC and ANN performed well, ANN was preferred for its scalability to larger datasets. ANN achieved a test accuracy of 0.73714, demonstrating its potential for real-world implementation in patient streamlining.Discussion: The study highlights the effectiveness of ML models in improving clinical trial workflows. XGBoost and Random Forest demonstrated robust performance for large clinical datasets in optimizing trial parameters, while ANN proved advantageous for patient eligibility classification due to its scalability. These findings underscore the potential of ML to enhance decision-making, reduce delays, and improve the accuracy of clinical trial outcomes. As ML technology continues to evolve, its integration into clinical research could drive innovation and improve patient care.","PeriodicalId":47313,"journal":{"name":"ClinicoEconomics and Outcomes Research","volume":"17 ","pages":"1-18"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11745069/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ClinicoEconomics and Outcomes Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2147/CEOR.S479603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Clinical trials are critical for drug development and patient care; however, they often need more efficient trial design and patient enrolment processes. This research explores integrating machine learning (ML) techniques to address these challenges. Specifically, the study investigates ML models for two critical aspects: (1) streamlining clinical trial design parameters (like the site of drug action, type of Interventional/Observational model, etc) and (2) optimizing patient/volunteer enrolment for trials through efficient classification techniques.

Methods: The study utilized two datasets: the first, with 55,000 samples (from ClinicalTrials.gov), was divided into five subsets (10,000-15,000 rows each) for model evaluation, focusing on trial parameter optimization. The second dataset targeted patient eligibility classification (from the UCI ML Repository). Five ML models-XGBoost, Random Forest, Support Vector Classifier (SVC), Logistic Regression, and Decision Tree-were applied to both datasets, alongside Artificial Neural Networks (ANN) for the second dataset. Model performance was evaluated using precision, recall, balanced accuracy, ROC-AUC, and weighted F1-score, with results averaged across k-fold cross-validation.

Results: In the first phase, XGBoost and Random Forest emerged as the best-performing models across all five subsets, achieving an average balanced accuracy of 0.71 and an average ROC-AUC of 0.7. The second dataset analysis revealed that while SVC and ANN performed well, ANN was preferred for its scalability to larger datasets. ANN achieved a test accuracy of 0.73714, demonstrating its potential for real-world implementation in patient streamlining.

Discussion: The study highlights the effectiveness of ML models in improving clinical trial workflows. XGBoost and Random Forest demonstrated robust performance for large clinical datasets in optimizing trial parameters, while ANN proved advantageous for patient eligibility classification due to its scalability. These findings underscore the potential of ML to enhance decision-making, reduce delays, and improve the accuracy of clinical trial outcomes. As ML technology continues to evolve, its integration into clinical research could drive innovation and improve patient care.

查看原文本刊更多论文

一种使用ML技术进行临床试验设计和加快患者入职过程的新模型。

临床试验对药物开发和患者护理至关重要；然而，它们往往需要更有效的试验设计和患者入组过程。本研究探索整合机器学习（ML）技术来解决这些挑战。具体而言，该研究从两个关键方面探讨了ML模型：(1)简化临床试验设计参数（如药物作用部位、介入/观察模型类型等）；(2)通过有效的分类技术优化患者/志愿者的试验招募。方法：研究利用两个数据集：第一个数据集，55,000个样本（来自ClinicalTrials.gov），分为5个子集（每个子集10,000-15,000行）进行模型评估，重点是试验参数优化。第二个数据集的目标是患者资格分类（来自UCI ML Repository）。五个ML模型- xgboost，随机森林，支持向量分类器（SVC），逻辑回归和决策树-应用于两个数据集，以及人工神经网络（ANN）用于第二个数据集。通过精密度、召回率、平衡准确度、ROC-AUC和加权f1评分来评估模型的性能，结果在k-fold交叉验证中平均。结果：在第一阶段，XGBoost和Random Forest成为所有五个子集中表现最好的模型，平均平衡精度为0.71，平均ROC-AUC为0.7。第二组数据分析表明，尽管SVC和人工神经网络表现良好，但人工神经网络因其对更大数据集的可扩展性而受到青睐。人工神经网络达到了0.73714的测试精度，证明了其在现实世界中实现患者简化的潜力。讨论：该研究强调了ML模型在改善临床试验工作流程方面的有效性。XGBoost和Random Forest在优化试验参数方面对大型临床数据集表现出稳健的性能，而人工神经网络因其可扩展性而在患者资格分类方面表现出优势。这些发现强调了机器学习在增强决策、减少延迟和提高临床试验结果准确性方面的潜力。随着机器学习技术的不断发展，将其整合到临床研究中可以推动创新并改善患者护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊