Fabio Dennstädt, Paul Windisch, Irina Filchenko, Johannes Zink, Paul Martin Putora, Ahmed Shaheen, Roberto Gaio, Nikola Cihoric, Marie Wosny, Stefanie Aeppli, Max Schmerder, Mohamed Shelan, Janna Hastings
{"title":"Application of a general LLM-based classification system to retrieve information about oncological trials.","authors":"Fabio Dennstädt, Paul Windisch, Irina Filchenko, Johannes Zink, Paul Martin Putora, Ahmed Shaheen, Roberto Gaio, Nikola Cihoric, Marie Wosny, Stefanie Aeppli, Max Schmerder, Mohamed Shelan, Janna Hastings","doi":"10.1159/000546946","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The automated classification of clinical trials and key categories within the medical literature is increasingly relevant, particularly in oncology, as the volume of publications and trial reports continues to expand. Large Language Models (LLMs) may provide new opportunities for automating diverse classification tasks. They could be used for general-purpose text classification retrieving information about oncological trials.</p><p><strong>Methods and materials: </strong>A general text classification framework with adaptable prompt, model and categories for the classification was developed. The framework was tested with four datasets comprising nine binary classification questions related to oncological trials. Evaluation was conducted using a locally hosted Mixtral-8x7B-Instruct v0.1-GPTQ model and three cloud-based LLMs: Mixtral-8x7B-Instruct v0.1, Llama3.1-70B-Instruct, and Qwen-2.5-72B.</p><p><strong>Results: </strong>The system consistently produced valid responses with the local Mixtral-8x7B-Instruct model and the Llama3.1-70B-Instruct model. It achieved a response validity rate of 99.70% and 99.88% for the cloud-based Mixtral and Qwen models, respectively. Across all models, the framework achieved an overall accuracy of >94%, precision of >92%, recall of >90%, and an F1-score of >92%. Question-specific accuracy ranged from 86.33% to 99.83% for the local Mixtral model, 85.49% to 99.83% for the cloud-based Mixtral model, 90.50% to 99.83% for the Llama3.1 model, and 77.13% to 99.83% for the Qwen model.</p><p><strong>Conclusions: </strong>The LLM-based classification framework exhibits robust accuracy and adaptability across various oncological trial classification tasks. While there remain some challenges such as strong prompt dependence and high computational and hardware demands, LLMs will play a crucial role for automating the classification of oncological trials and literature as the technology continues to advance.</p>","PeriodicalId":19497,"journal":{"name":"Oncology","volume":" ","pages":"1-18"},"PeriodicalIF":2.5000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000546946","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The automated classification of clinical trials and key categories within the medical literature is increasingly relevant, particularly in oncology, as the volume of publications and trial reports continues to expand. Large Language Models (LLMs) may provide new opportunities for automating diverse classification tasks. They could be used for general-purpose text classification retrieving information about oncological trials.
Methods and materials: A general text classification framework with adaptable prompt, model and categories for the classification was developed. The framework was tested with four datasets comprising nine binary classification questions related to oncological trials. Evaluation was conducted using a locally hosted Mixtral-8x7B-Instruct v0.1-GPTQ model and three cloud-based LLMs: Mixtral-8x7B-Instruct v0.1, Llama3.1-70B-Instruct, and Qwen-2.5-72B.
Results: The system consistently produced valid responses with the local Mixtral-8x7B-Instruct model and the Llama3.1-70B-Instruct model. It achieved a response validity rate of 99.70% and 99.88% for the cloud-based Mixtral and Qwen models, respectively. Across all models, the framework achieved an overall accuracy of >94%, precision of >92%, recall of >90%, and an F1-score of >92%. Question-specific accuracy ranged from 86.33% to 99.83% for the local Mixtral model, 85.49% to 99.83% for the cloud-based Mixtral model, 90.50% to 99.83% for the Llama3.1 model, and 77.13% to 99.83% for the Qwen model.
Conclusions: The LLM-based classification framework exhibits robust accuracy and adaptability across various oncological trial classification tasks. While there remain some challenges such as strong prompt dependence and high computational and hardware demands, LLMs will play a crucial role for automating the classification of oncological trials and literature as the technology continues to advance.
期刊介绍:
Although laboratory and clinical cancer research need to be closely linked, observations at the basic level often remain removed from medical applications. This journal works to accelerate the translation of experimental results into the clinic, and back again into the laboratory for further investigation. The fundamental purpose of this effort is to advance clinically-relevant knowledge of cancer, and improve the outcome of prevention, diagnosis and treatment of malignant disease. The journal publishes significant clinical studies from cancer programs around the world, along with important translational laboratory findings, mini-reviews (invited and submitted) and in-depth discussions of evolving and controversial topics in the oncology arena. A unique feature of the journal is a new section which focuses on rapid peer-review and subsequent publication of short reports of phase 1 and phase 2 clinical cancer trials, with a goal of insuring that high-quality clinical cancer research quickly enters the public domain, regardless of the trial’s ultimate conclusions regarding efficacy or toxicity.