Enhancing hepatopathy clinical trial efficiency: a secure, large language model-powered pre-screening pipeline.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-06-14 DOI:10.1186/s13040-025-00458-5

Xiongbin Gui, Hanlin Lv, Xiao Wang, Longting Lv, Yi Xiao, Lei Wang

{"title":"Enhancing hepatopathy clinical trial efficiency: a secure, large language model-powered pre-screening pipeline.","authors":"Xiongbin Gui, Hanlin Lv, Xiao Wang, Longting Lv, Yi Xiao, Lei Wang","doi":"10.1186/s13040-025-00458-5","DOIUrl":null,"url":null,"abstract":"Background: Recruitment for cohorts involving complex liver diseases, such as hepatocellular carcinoma and liver cirrhosis, often requires interpreting semantically complex criteria. Traditional manual screening methods are time-consuming and prone to errors. While AI-powered pre-screening offers potential solutions, challenges remain regarding accuracy, efficiency, and data privacy.Methods: We developed a novel patient pre-screening pipeline that leverages clinical expertise to guide the precise, safe, and efficient application of large language models. The pipeline breaks down complex criteria into a series of composite questions and then employs two strategies to perform semantic question-answering through electronic health records: (1) Pathway A, Anthropomorphized Experts' Chain of Thought strategy; and (2) Pathway B, Preset Stances within an Agent Collaboration strategy, particularly in managing complex clinical reasoning scenarios. The pipeline is evaluated on key metrics including precision, recall, time consumption, and counterfactual inference-at both the question and criterion levels.Results: Our pipeline achieved a notable balance of high precision (e.g., 0.921, criteria level) and good overall recall (e.g., ~ 0.82, criteria level), alongside high efficiency (0.44s per task). Pathway B excelled in high-precision complex reasoning (while exhibiting a specific recall profile conducive to accuracy), whereas Pathway A was particularly effective for tasks requiring both robust precision and recall (e.g., direct data extraction), often with faster processing times. Both pathways achieved comparable overall precision while offering different strengths in the precision-recall trade-off. The pipeline showed promising precision-focused results in hepatocellular carcinoma (0.878) and cirrhosis trials (0.843).Conclusions: This data-secure and time-efficient pipeline shows high precision and achieves good recall in hepatopathy trials, providing promising solutions for streamlining clinical trial workflows. Its efficiency, adaptability, and balanced performance profile make it suitable for improving patient recruitment. And its capability to function in resource-constrained environments further enhances its utility in clinical settings.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"42"},"PeriodicalIF":6.1000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12167571/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00458-5","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Recruitment for cohorts involving complex liver diseases, such as hepatocellular carcinoma and liver cirrhosis, often requires interpreting semantically complex criteria. Traditional manual screening methods are time-consuming and prone to errors. While AI-powered pre-screening offers potential solutions, challenges remain regarding accuracy, efficiency, and data privacy.

Methods: We developed a novel patient pre-screening pipeline that leverages clinical expertise to guide the precise, safe, and efficient application of large language models. The pipeline breaks down complex criteria into a series of composite questions and then employs two strategies to perform semantic question-answering through electronic health records: (1) Pathway A, Anthropomorphized Experts' Chain of Thought strategy; and (2) Pathway B, Preset Stances within an Agent Collaboration strategy, particularly in managing complex clinical reasoning scenarios. The pipeline is evaluated on key metrics including precision, recall, time consumption, and counterfactual inference-at both the question and criterion levels.

Results: Our pipeline achieved a notable balance of high precision (e.g., 0.921, criteria level) and good overall recall (e.g., ~ 0.82, criteria level), alongside high efficiency (0.44s per task). Pathway B excelled in high-precision complex reasoning (while exhibiting a specific recall profile conducive to accuracy), whereas Pathway A was particularly effective for tasks requiring both robust precision and recall (e.g., direct data extraction), often with faster processing times. Both pathways achieved comparable overall precision while offering different strengths in the precision-recall trade-off. The pipeline showed promising precision-focused results in hepatocellular carcinoma (0.878) and cirrhosis trials (0.843).

Conclusions: This data-secure and time-efficient pipeline shows high precision and achieves good recall in hepatopathy trials, providing promising solutions for streamlining clinical trial workflows. Its efficiency, adaptability, and balanced performance profile make it suitable for improving patient recruitment. And its capability to function in resource-constrained environments further enhances its utility in clinical settings.

Abstract Image

查看原文本刊更多论文

提高肝病临床试验效率：一个安全的、大型语言模型驱动的预筛选管道。

背景：招募涉及复杂肝脏疾病的队列，如肝细胞癌和肝硬化，通常需要解释语义上复杂的标准。传统的人工筛选方法既耗时又容易出错。虽然人工智能预筛选提供了潜在的解决方案，但在准确性、效率和数据隐私方面仍然存在挑战。方法：我们开发了一种新的患者预筛选管道，利用临床专业知识指导大型语言模型的精确、安全和高效应用。该管道将复杂的标准分解为一系列复合问题，然后采用两种策略通过电子病历进行语义问答：(1)途径a，人格化专家思维链策略；(2)途径B， Agent协作策略中的预设立场，特别是在管理复杂的临床推理场景时。管道在关键指标上进行评估，包括精确度、召回率、时间消耗和反事实推理——在问题和标准级别。结果：我们的管道在高精度（例如，0.921，标准水平）和良好的总体召回率（例如，~ 0.82，标准水平）以及高效率（每个任务0.44s）之间取得了显着的平衡。路径B擅长于高精度复杂推理（同时表现出有利于准确性的特定召回配置文件），而路径a对于需要强大精度和召回的任务（例如，直接数据提取）特别有效，通常处理时间更快。两种方法都达到了相当的总体精度，同时在精度-召回权衡方面提供了不同的优势。该管道在肝细胞癌（0.878）和肝硬化试验（0.843）中显示出有希望的精确结果。结论：这种数据安全、时间高效的管道在肝病试验中具有较高的准确性和良好的召回率，为简化临床试验工作流程提供了有希望的解决方案。它的效率、适应性和平衡的性能使其适合于改善患者招募。它在资源有限的环境中发挥作用的能力进一步增强了它在临床环境中的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.