Application of a General Large Language Model-Based Classification System to Retrieve Information about Oncological Trials.

IF 1.8 3区医学 Q3 ONCOLOGY

Oncology Pub Date : 2025-06-13 DOI:10.1159/000546946

Fabio Dennstädt, Paul Windisch, Irina Filchenko, Johannes Zink, Paul Martin Putora, Ahmed Shaheen, Roberto Gaio, Nikola Cihoric, Marie Wosny, Stefanie Aeppli, Max Schmerder, Mohamed Shelan, Janna Hastings

{"title":"Application of a General Large Language Model-Based Classification System to Retrieve Information about Oncological Trials.","authors":"Fabio Dennstädt, Paul Windisch, Irina Filchenko, Johannes Zink, Paul Martin Putora, Ahmed Shaheen, Roberto Gaio, Nikola Cihoric, Marie Wosny, Stefanie Aeppli, Max Schmerder, Mohamed Shelan, Janna Hastings","doi":"10.1159/000546946","DOIUrl":null,"url":null,"abstract":"Introduction: The automated classification of clinical trials and key categories within the medical literature is increasingly relevant, particularly in oncology, as the volume of publications and trial reports continues to expand. Large language models (LLMs) may provide new opportunities for automating diverse classification tasks. They could be used for general-purpose text classification, retrieving information about oncological trials.Methods: A general text classification framework with adaptable prompt, model and categories for the classification was developed. The framework was tested with four datasets comprising nine binary classification questions related to oncological trials. Evaluation was conducted using a locally hosted Mixtral-8x7B-Instruct v0.1-GPTQ model and three cloud-based LLMs: Mixtral-8x7B-Instruct v0.1, Llama3.1-70B-Instruct, and Qwen-2.5-72B.Results: The system consistently produced valid responses with the local Mixtral-8x7B-Instruct model and the Llama3.1-70B-Instruct model. It achieved a response validity rate of 99.70% and 99.88% for the cloud-based Mixtral and Qwen models, respectively. Across all models, the framework achieved an overall accuracy of >94%, precision of >92%, recall of >90%, and an F1-score of >92%. Question-specific accuracy ranged from 86.33% to 99.83% for the local Mixtral model, 85.49%-99.83% for the cloud-based Mixtral model, 90.50%-99.83% for the Llama3.1 model, and 77.13%-99.83% for the Qwen model.Conclusion: The LLM-based classification framework exhibits robust accuracy and adaptability across various oncological trial classification tasks. While there remain some challenges such as strong prompt dependence and high computational and hardware demands, LLMs will play a crucial role in automating the classification of oncological trials and literature as the technology continues to advance.","PeriodicalId":19497,"journal":{"name":"Oncology","volume":" ","pages":"1-11"},"PeriodicalIF":1.8000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000546946","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: The automated classification of clinical trials and key categories within the medical literature is increasingly relevant, particularly in oncology, as the volume of publications and trial reports continues to expand. Large language models (LLMs) may provide new opportunities for automating diverse classification tasks. They could be used for general-purpose text classification, retrieving information about oncological trials.

Methods: A general text classification framework with adaptable prompt, model and categories for the classification was developed. The framework was tested with four datasets comprising nine binary classification questions related to oncological trials. Evaluation was conducted using a locally hosted Mixtral-8x7B-Instruct v0.1-GPTQ model and three cloud-based LLMs: Mixtral-8x7B-Instruct v0.1, Llama3.1-70B-Instruct, and Qwen-2.5-72B.

Results: The system consistently produced valid responses with the local Mixtral-8x7B-Instruct model and the Llama3.1-70B-Instruct model. It achieved a response validity rate of 99.70% and 99.88% for the cloud-based Mixtral and Qwen models, respectively. Across all models, the framework achieved an overall accuracy of >94%, precision of >92%, recall of >90%, and an F1-score of >92%. Question-specific accuracy ranged from 86.33% to 99.83% for the local Mixtral model, 85.49%-99.83% for the cloud-based Mixtral model, 90.50%-99.83% for the Llama3.1 model, and 77.13%-99.83% for the Qwen model.

Conclusion: The LLM-based classification framework exhibits robust accuracy and adaptability across various oncological trial classification tasks. While there remain some challenges such as strong prompt dependence and high computational and hardware demands, LLMs will play a crucial role in automating the classification of oncological trials and literature as the technology continues to advance.

查看原文本刊更多论文

应用一般的基于法学硕士的分类系统检索有关肿瘤试验的信息。

目的：随着出版物和试验报告的数量不断增加，医学文献中临床试验和关键类别的自动分类越来越相关，特别是在肿瘤学领域。大型语言模型（llm）可能为自动化各种分类任务提供新的机会。它们可用于通用文本分类检索肿瘤试验信息。方法与材料：开发了一个具有自适应提示、模型和分类范畴的通用文本分类框架。该框架用四个数据集进行了测试，这些数据集包括与肿瘤试验相关的九个二元分类问题。使用本地托管的Mixtral-8x7B-Instruct v0.1- gptq模型和三个基于云的llm: Mixtral-8x7B-Instruct v0.1, Llama3.1-70B-Instruct和Qwen-2.5-72B进行评估。结果：系统对本地Mixtral-8x7B-Instruct模型和Llama3.1-70B-Instruct模型均能产生有效的响应。基于云的Mixtral和Qwen模型的响应效度分别达到99.70%和99.88%。在所有模型中，该框架的总体准确率为>94%，精密度为>92%，召回率为>90%，f1评分为>92%。本地Mixtral模型的问题特异性准确率为86.33% ~ 99.83%，基于云的Mixtral模型为85.49% ~ 99.83%,Llama3.1模型为90.50% ~ 99.83%，Qwen模型为77.13% ~ 99.83%。结论：基于llm的分类框架在各种肿瘤试验分类任务中表现出强大的准确性和适应性。虽然仍然存在一些挑战，如强烈的提示依赖性和高计算和硬件需求，但随着技术的不断进步，法学硕士将在肿瘤试验和文献的自动化分类方面发挥关键作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Oncology 医学-肿瘤学

CiteScore

6.00

自引率

2.90%

发文量

审稿时长

6-12 weeks

期刊介绍： Although laboratory and clinical cancer research need to be closely linked, observations at the basic level often remain removed from medical applications. This journal works to accelerate the translation of experimental results into the clinic, and back again into the laboratory for further investigation. The fundamental purpose of this effort is to advance clinically-relevant knowledge of cancer, and improve the outcome of prevention, diagnosis and treatment of malignant disease. The journal publishes significant clinical studies from cancer programs around the world, along with important translational laboratory findings, mini-reviews (invited and submitted) and in-depth discussions of evolving and controversial topics in the oncology arena. A unique feature of the journal is a new section which focuses on rapid peer-review and subsequent publication of short reports of phase 1 and phase 2 clinical cancer trials, with a goal of insuring that high-quality clinical cancer research quickly enters the public domain, regardless of the trial’s ultimate conclusions regarding efficacy or toxicity.