{"title":"临床试验方案评估的大型语言模型。","authors":"Euibeom Shin, Amruta Gajanan Bhat, Murali Ramanathan","doi":"10.1002/cpt.70096","DOIUrl":null,"url":null,"abstract":"<p><p>The purpose was to evaluate the utility of large language models (LLMs) for reviewing the statistical analysis plan (SAP) and pharmacokinetics-pharmacodynamics (PK-PD) components of clinical trial protocols. Clinical trial protocols and SAPs were obtained from clinicaltrials.gov for a testbed of 15 small-molecule drugs, biologics, and global antibiotic and public health interventions. The GPT-4o (ChatGPT) LLM was used to elicit study design attributes, relevant guidelines, and detailed SAP evaluations with prompts engineered to the persona of a regulatory expert. The SAP methodology was assessed against the Food and Drug Administration's (FDA) E9 Statistical Principles for Clinical Trials guidance. The SAP evaluation outputs were assessed in post hoc analyses with ChatGPT and Grok, based on a rubric that evaluated the accuracy of primary outcome identification, the correctness of statistical methodology, compliance with the FDA E9 guidance, and clinical interpretability. PK-PD analysis plans were assessed on the accuracy of PK-PD objectives and measures and PK analysis methods. ChatGPT accurately identified the disease, intervention, and comparator groups for all trials, as well as the study sample size for 14 out of 15 trials. The most frequently cited guidelines were the FDA's E9 guidance for SAP and the FDA Guidance for Industry: Population Pharmacokinetics for PK-PD. ChatGPT outputs of the SAP and PK-PD analysis plans were clear and organized, demonstrating a satisfactory ability to extract and summarize technical details; some limitations in contextual accuracy were observed. LLMs can be effective tools for assessing the SAP, PK-PD, and other aspects of clinical trial protocol reviews.</p>","PeriodicalId":153,"journal":{"name":"Clinical Pharmacology & Therapeutics","volume":" ","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Language Models for Clinical Trial Protocol Assessments.\",\"authors\":\"Euibeom Shin, Amruta Gajanan Bhat, Murali Ramanathan\",\"doi\":\"10.1002/cpt.70096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The purpose was to evaluate the utility of large language models (LLMs) for reviewing the statistical analysis plan (SAP) and pharmacokinetics-pharmacodynamics (PK-PD) components of clinical trial protocols. Clinical trial protocols and SAPs were obtained from clinicaltrials.gov for a testbed of 15 small-molecule drugs, biologics, and global antibiotic and public health interventions. The GPT-4o (ChatGPT) LLM was used to elicit study design attributes, relevant guidelines, and detailed SAP evaluations with prompts engineered to the persona of a regulatory expert. The SAP methodology was assessed against the Food and Drug Administration's (FDA) E9 Statistical Principles for Clinical Trials guidance. The SAP evaluation outputs were assessed in post hoc analyses with ChatGPT and Grok, based on a rubric that evaluated the accuracy of primary outcome identification, the correctness of statistical methodology, compliance with the FDA E9 guidance, and clinical interpretability. PK-PD analysis plans were assessed on the accuracy of PK-PD objectives and measures and PK analysis methods. ChatGPT accurately identified the disease, intervention, and comparator groups for all trials, as well as the study sample size for 14 out of 15 trials. The most frequently cited guidelines were the FDA's E9 guidance for SAP and the FDA Guidance for Industry: Population Pharmacokinetics for PK-PD. ChatGPT outputs of the SAP and PK-PD analysis plans were clear and organized, demonstrating a satisfactory ability to extract and summarize technical details; some limitations in contextual accuracy were observed. LLMs can be effective tools for assessing the SAP, PK-PD, and other aspects of clinical trial protocol reviews.</p>\",\"PeriodicalId\":153,\"journal\":{\"name\":\"Clinical Pharmacology & Therapeutics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Pharmacology & Therapeutics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/cpt.70096\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Pharmacology & Therapeutics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/cpt.70096","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
Large Language Models for Clinical Trial Protocol Assessments.
The purpose was to evaluate the utility of large language models (LLMs) for reviewing the statistical analysis plan (SAP) and pharmacokinetics-pharmacodynamics (PK-PD) components of clinical trial protocols. Clinical trial protocols and SAPs were obtained from clinicaltrials.gov for a testbed of 15 small-molecule drugs, biologics, and global antibiotic and public health interventions. The GPT-4o (ChatGPT) LLM was used to elicit study design attributes, relevant guidelines, and detailed SAP evaluations with prompts engineered to the persona of a regulatory expert. The SAP methodology was assessed against the Food and Drug Administration's (FDA) E9 Statistical Principles for Clinical Trials guidance. The SAP evaluation outputs were assessed in post hoc analyses with ChatGPT and Grok, based on a rubric that evaluated the accuracy of primary outcome identification, the correctness of statistical methodology, compliance with the FDA E9 guidance, and clinical interpretability. PK-PD analysis plans were assessed on the accuracy of PK-PD objectives and measures and PK analysis methods. ChatGPT accurately identified the disease, intervention, and comparator groups for all trials, as well as the study sample size for 14 out of 15 trials. The most frequently cited guidelines were the FDA's E9 guidance for SAP and the FDA Guidance for Industry: Population Pharmacokinetics for PK-PD. ChatGPT outputs of the SAP and PK-PD analysis plans were clear and organized, demonstrating a satisfactory ability to extract and summarize technical details; some limitations in contextual accuracy were observed. LLMs can be effective tools for assessing the SAP, PK-PD, and other aspects of clinical trial protocol reviews.
期刊介绍:
Clinical Pharmacology & Therapeutics (CPT) is the authoritative cross-disciplinary journal in experimental and clinical medicine devoted to publishing advances in the nature, action, efficacy, and evaluation of therapeutics. CPT welcomes original Articles in the emerging areas of translational, predictive and personalized medicine; new therapeutic modalities including gene and cell therapies; pharmacogenomics, proteomics and metabolomics; bioinformation and applied systems biology complementing areas of pharmacokinetics and pharmacodynamics, human investigation and clinical trials, pharmacovigilence, pharmacoepidemiology, pharmacometrics, and population pharmacology.