Nigel Markey, Ilyass El-Mansouri, Gaetan Rensonnet, Casper van Langen, Christoph Meier
{"title":"From RAGs to riches: Utilizing large language models to write documents for clinical trials.","authors":"Nigel Markey, Ilyass El-Mansouri, Gaetan Rensonnet, Casper van Langen, Christoph Meier","doi":"10.1177/17407745251320806","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aims: </strong>Clinical trials require numerous documents to be written: Protocols, consent forms, clinical study reports, and many others. Large language models offer the potential to rapidly generate first-draft versions of these documents; however, there are concerns about the quality of their output. Here, we report an evaluation of how good large language models are at generating sections of one such document, clinical trial protocols.</p><p><strong>Methods: </strong>Using an off-the-shelf large language model, we generated protocol sections for a broad range of diseases and clinical trial phases. Each of these document sections we assessed across four dimensions: <i>Clinical thinking and logic; Transparency and references; Medical and clinical terminology</i>; and <i>Content relevance and suitability</i>. To improve performance, we used the retrieval-augmented generation method to enhance the large language model with accurate up-to-date information, including regulatory guidance documents and data from ClinicalTrials.gov. Using this retrieval-augmented generation large language model, we regenerated the same protocol sections and assessed them across the same four dimensions.</p><p><strong>Results: </strong>We find that the off-the-shelf large language model delivers reasonable results, especially when assessing <i>content relevance</i> and the <i>correct use of medical and clinical terminology</i>, with scores of over 80%. However, the off-the-shelf large language model shows limited performance in <i>clinical thinking and logic</i> and <i>transparency and references</i>, with assessment scores of ≈40% or less. The use of retrieval-augmented generation substantially improves the writing quality of the large language model, with <i>clinical thinking and logic</i> and <i>transparency and references</i> scores increasing to ≈80%. The retrieval-augmented generation method thus greatly improves the practical usability of large language models for clinical trial-related writing.</p><p><strong>Discussion: </strong>Our results suggest that hybrid large language model architectures, such as the retrieval-augmented generation method we utilized, offer strong potential for clinical trial-related writing, including a wide variety of documents. This is potentially transformative, since it addresses several major bottlenecks of drug development.</p>","PeriodicalId":10685,"journal":{"name":"Clinical Trials","volume":" ","pages":"17407745251320806"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Trials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/17407745251320806","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/aims: Clinical trials require numerous documents to be written: Protocols, consent forms, clinical study reports, and many others. Large language models offer the potential to rapidly generate first-draft versions of these documents; however, there are concerns about the quality of their output. Here, we report an evaluation of how good large language models are at generating sections of one such document, clinical trial protocols.
Methods: Using an off-the-shelf large language model, we generated protocol sections for a broad range of diseases and clinical trial phases. Each of these document sections we assessed across four dimensions: Clinical thinking and logic; Transparency and references; Medical and clinical terminology; and Content relevance and suitability. To improve performance, we used the retrieval-augmented generation method to enhance the large language model with accurate up-to-date information, including regulatory guidance documents and data from ClinicalTrials.gov. Using this retrieval-augmented generation large language model, we regenerated the same protocol sections and assessed them across the same four dimensions.
Results: We find that the off-the-shelf large language model delivers reasonable results, especially when assessing content relevance and the correct use of medical and clinical terminology, with scores of over 80%. However, the off-the-shelf large language model shows limited performance in clinical thinking and logic and transparency and references, with assessment scores of ≈40% or less. The use of retrieval-augmented generation substantially improves the writing quality of the large language model, with clinical thinking and logic and transparency and references scores increasing to ≈80%. The retrieval-augmented generation method thus greatly improves the practical usability of large language models for clinical trial-related writing.
Discussion: Our results suggest that hybrid large language model architectures, such as the retrieval-augmented generation method we utilized, offer strong potential for clinical trial-related writing, including a wide variety of documents. This is potentially transformative, since it addresses several major bottlenecks of drug development.
期刊介绍:
Clinical Trials is dedicated to advancing knowledge on the design and conduct of clinical trials related research methodologies. Covering the design, conduct, analysis, synthesis and evaluation of key methodologies, the journal remains on the cusp of the latest topics, including ethics, regulation and policy impact.