Clinical Laboratory Parameter-Driven Machine Learning for Participant Selection in Bioequivalence Studies Among Patients With Gastric Cancer: Framework Development and Validation Study.

IF 2
JMIR AI Pub Date : 2025-05-05 DOI:10.2196/64845
Byungeun Shon, Sook Jin Seong, Eun Jung Choi, Mi-Ri Gwon, Hae Won Lee, Jaechan Park, Ho-Young Chung, Sungmoon Jeong, Young-Ran Yoon
{"title":"Clinical Laboratory Parameter-Driven Machine Learning for Participant Selection in Bioequivalence Studies Among Patients With Gastric Cancer: Framework Development and Validation Study.","authors":"Byungeun Shon, Sook Jin Seong, Eun Jung Choi, Mi-Ri Gwon, Hae Won Lee, Jaechan Park, Ho-Young Chung, Sungmoon Jeong, Young-Ran Yoon","doi":"10.2196/64845","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Insufficient participant enrollment is a major factor responsible for clinical trial failure.</p><p><strong>Objective: </strong>We formulated a machine learning (ML)-based framework using clinical laboratory parameters to identify participants eligible for enrollment in a bioequivalence study.</p><p><strong>Methods: </strong>We acquired records of 11,592 patients with gastric cancer from the electronic medical records of Kyungpook National University Hospital in Korea. The ML model was developed using 8 clinical laboratory parameters, including complete blood count and liver and kidney function tests, along with the dates of acquisition. Two datasets were collected: (1) a training dataset to design an ML-based candidate selection method and (2) a test dataset to evaluate the performance of the proposed method. The generalization performance of the ML-based method was confirmed using the F1-score and the area under the curve (AUC). The proposed model was compared with a random selection method to evaluate its efficacy in recruiting participants.</p><p><strong>Results: </strong>The weighted ensemble model achieved strong performance with an F1-score above 0.8 and an AUC value exceeding 0.8, demonstrating its ability to accurately identify valid clinical trial candidates while minimizing misclassification. Its high sensitivity further enhanced the model's efficiency in prioritizing patients for screening. In a case study, the proposed ML model reduced the workload by 57%, efficiently identifying 150 valid patients from a pool of 209, compared to the 485 patients required by random selection.</p><p><strong>Conclusions: </strong>The proposed ML-based framework using clinical laboratory parameters can be used to identify patients eligible for a clinical trial, enabling faster participant enrollment.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e64845"},"PeriodicalIF":2.0000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223687/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/64845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Insufficient participant enrollment is a major factor responsible for clinical trial failure.

Objective: We formulated a machine learning (ML)-based framework using clinical laboratory parameters to identify participants eligible for enrollment in a bioequivalence study.

Methods: We acquired records of 11,592 patients with gastric cancer from the electronic medical records of Kyungpook National University Hospital in Korea. The ML model was developed using 8 clinical laboratory parameters, including complete blood count and liver and kidney function tests, along with the dates of acquisition. Two datasets were collected: (1) a training dataset to design an ML-based candidate selection method and (2) a test dataset to evaluate the performance of the proposed method. The generalization performance of the ML-based method was confirmed using the F1-score and the area under the curve (AUC). The proposed model was compared with a random selection method to evaluate its efficacy in recruiting participants.

Results: The weighted ensemble model achieved strong performance with an F1-score above 0.8 and an AUC value exceeding 0.8, demonstrating its ability to accurately identify valid clinical trial candidates while minimizing misclassification. Its high sensitivity further enhanced the model's efficiency in prioritizing patients for screening. In a case study, the proposed ML model reduced the workload by 57%, efficiently identifying 150 valid patients from a pool of 209, compared to the 485 patients required by random selection.

Conclusions: The proposed ML-based framework using clinical laboratory parameters can be used to identify patients eligible for a clinical trial, enabling faster participant enrollment.

用于胃癌患者生物等效性研究参与者选择的临床实验室参数驱动机器学习:框架开发和验证研究。
背景:受试者入组不足是导致临床试验失败的主要因素。目的:我们制定了一个基于机器学习(ML)的框架,使用临床实验室参数来确定有资格参加生物等效性研究的参与者。方法:从韩国庆北大学医院电子病历中获取11,592例胃癌患者的病历资料。ML模型是使用8个临床实验室参数开发的,包括全血细胞计数和肝肾功能测试,以及采集日期。收集了两个数据集:(1)训练数据集用于设计基于ml的候选选择方法;(2)测试数据集用于评估所提出方法的性能。利用f1评分和曲线下面积(AUC)来验证基于ml的方法的泛化性能。将该模型与随机选择方法进行比较,以评估其招募参与者的有效性。结果:加权集成模型取得了较强的性能,f1得分在0.8以上,AUC值超过0.8,表明其能够准确识别有效的临床试验候选人,同时最大限度地减少错误分类。它的高灵敏度进一步提高了模型筛选患者的效率。在一个案例研究中,所提出的ML模型减少了57%的工作量,有效地从209名患者中识别出150名有效患者,而随机选择需要485名患者。结论:采用临床实验室参数的基于ml的框架可用于识别符合临床试验条件的患者,从而加快参与者入组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信