用于指导药物流行病学研究设计的现成大型语言模型。

IF 5.5 2区医学 Q1 PHARMACOLOGY & PHARMACY

Clinical Pharmacology & Therapeutics Pub Date : 2025-09-10 DOI:10.1002/cpt.70039

Gerard Ompad, Keele Wurst, Darmendra Ramcharran, Anders Hviid, Andrew Bate, Maurizio Sessa

{"title":"用于指导药物流行病学研究设计的现成大型语言模型。","authors":"Gerard Ompad, Keele Wurst, Darmendra Ramcharran, Anders Hviid, Andrew Bate, Maurizio Sessa","doi":"10.1002/cpt.70039","DOIUrl":null,"url":null,"abstract":"This study aimed to assess the ability of two off-the-shelf large language models, ChatGPT and Gemini, to support the design of pharmacoepidemiological studies. We assessed 48 study protocols of pharmacoepidemiological studies published between 2018 and 2024, covering various study types, including disease epidemiology, drug utilization, safety, and effectiveness. The coherence (i.e., \"Is the response coherent and well-formed, or is it difficult to understand?\") and relevance (i.e., \"Is the response relevant and informative, or is it lacking in substance?\") of the large language models' responses were evaluated by human experts across seven key study design components. Coding accuracy was assessed. Both large language models demonstrated high coherence, with over 90% of study components rated as \"Strongly agree\" by experts for most categories. ChatGPT achieved the highest coherence for \"Index date\" (97.9%) and \"Study design\" (95.8%). Gemini excelled in \"Study outcome\" (93.9%) and \"Study exposure\" (95.9%). Relevance, however, was more variable, with ChatGPT aligning with expert recommendations in over 90% of cases for \"Index date\" and \"Study design\" but showing lower agreement for covariates (65%) and follow-up (70%). Coding agreement percentages reveal varying levels of concordance, with the Anatomical Therapeutic Chemical classification system coding system demonstrating the highest agreement at 50% with experts. In contrast, the Current Procedural Terminology and International Classification of Diseases systems showed agreements of 22.2% and 20%, respectively. While ChatGPT and Gemini show promise in certain tasks supporting pharmacoepidemiological study design, their limitations in relevance and coding accuracy highlight the need for critical oversight by domain experts.","PeriodicalId":153,"journal":{"name":"Clinical Pharmacology & Therapeutics","volume":" ","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Off-the-Shelf Large Language Models for Guiding Pharmacoepidemiological Study Design.\",\"authors\":\"Gerard Ompad, Keele Wurst, Darmendra Ramcharran, Anders Hviid, Andrew Bate, Maurizio Sessa\",\"doi\":\"10.1002/cpt.70039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aimed to assess the ability of two off-the-shelf large language models, ChatGPT and Gemini, to support the design of pharmacoepidemiological studies. We assessed 48 study protocols of pharmacoepidemiological studies published between 2018 and 2024, covering various study types, including disease epidemiology, drug utilization, safety, and effectiveness. The coherence (i.e., \\\"Is the response coherent and well-formed, or is it difficult to understand?\\\") and relevance (i.e., \\\"Is the response relevant and informative, or is it lacking in substance?\\\") of the large language models' responses were evaluated by human experts across seven key study design components. Coding accuracy was assessed. Both large language models demonstrated high coherence, with over 90% of study components rated as \\\"Strongly agree\\\" by experts for most categories. ChatGPT achieved the highest coherence for \\\"Index date\\\" (97.9%) and \\\"Study design\\\" (95.8%). Gemini excelled in \\\"Study outcome\\\" (93.9%) and \\\"Study exposure\\\" (95.9%). Relevance, however, was more variable, with ChatGPT aligning with expert recommendations in over 90% of cases for \\\"Index date\\\" and \\\"Study design\\\" but showing lower agreement for covariates (65%) and follow-up (70%). Coding agreement percentages reveal varying levels of concordance, with the Anatomical Therapeutic Chemical classification system coding system demonstrating the highest agreement at 50% with experts. In contrast, the Current Procedural Terminology and International Classification of Diseases systems showed agreements of 22.2% and 20%, respectively. While ChatGPT and Gemini show promise in certain tasks supporting pharmacoepidemiological study design, their limitations in relevance and coding accuracy highlight the need for critical oversight by domain experts.\",\"PeriodicalId\":153,\"journal\":{\"name\":\"Clinical Pharmacology & Therapeutics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Pharmacology & Therapeutics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/cpt.70039\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Pharmacology & Therapeutics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/cpt.70039","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}

引用次数: 0

摘要

本研究旨在评估两个现成的大型语言模型ChatGPT和Gemini的能力，以支持药物流行病学研究的设计。我们评估了2018年至2024年间发表的48项药物流行病学研究方案，涵盖了疾病流行病学、药物利用、安全性和有效性等多种研究类型。连贯性（即，“回答是否连贯且格式良好，还是难以理解？”）和相关性（即，“回应是否相关和信息丰富，还是缺乏实质内容？”）大型语言模型的反应由人类专家在七个关键的研究设计组成部分进行评估。评估编码准确性。两个大型语言模型都显示出高度的一致性，在大多数类别中，超过90%的研究成分被专家评为“非常一致”。ChatGPT在“索引日期”（97.9%）和“研究设计”（95.8%）方面达到了最高的一致性。双子座在“研究结果”（93.9%）和“研究暴露”（95.9%）方面表现出色。然而，相关性是可变的，ChatGPT在“索引日期”和“研究设计”方面与专家建议的一致性超过90%，但在协变量（65%）和随访（70%）方面的一致性较低。编码一致性百分比揭示了不同程度的一致性，解剖治疗化学分类系统编码系统与专家的一致性最高，为50%。相比之下，《现行程序术语》和《国际疾病分类》系统的一致性分别为22.2%和20%。虽然ChatGPT和Gemini在支持药物流行病学研究设计的某些任务中表现出希望，但它们在相关性和编码准确性方面的局限性突出了领域专家关键监督的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Off-the-Shelf Large Language Models for Guiding Pharmacoepidemiological Study Design.

This study aimed to assess the ability of two off-the-shelf large language models, ChatGPT and Gemini, to support the design of pharmacoepidemiological studies. We assessed 48 study protocols of pharmacoepidemiological studies published between 2018 and 2024, covering various study types, including disease epidemiology, drug utilization, safety, and effectiveness. The coherence (i.e., "Is the response coherent and well-formed, or is it difficult to understand?") and relevance (i.e., "Is the response relevant and informative, or is it lacking in substance?") of the large language models' responses were evaluated by human experts across seven key study design components. Coding accuracy was assessed. Both large language models demonstrated high coherence, with over 90% of study components rated as "Strongly agree" by experts for most categories. ChatGPT achieved the highest coherence for "Index date" (97.9%) and "Study design" (95.8%). Gemini excelled in "Study outcome" (93.9%) and "Study exposure" (95.9%). Relevance, however, was more variable, with ChatGPT aligning with expert recommendations in over 90% of cases for "Index date" and "Study design" but showing lower agreement for covariates (65%) and follow-up (70%). Coding agreement percentages reveal varying levels of concordance, with the Anatomical Therapeutic Chemical classification system coding system demonstrating the highest agreement at 50% with experts. In contrast, the Current Procedural Terminology and International Classification of Diseases systems showed agreements of 22.2% and 20%, respectively. While ChatGPT and Gemini show promise in certain tasks supporting pharmacoepidemiological study design, their limitations in relevance and coding accuracy highlight the need for critical oversight by domain experts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Pharmacology & Therapeutics 医学-药学

CiteScore

12.70

自引率

7.50%

发文量

290

审稿时长

2 months

期刊介绍： Clinical Pharmacology & Therapeutics (CPT) is the authoritative cross-disciplinary journal in experimental and clinical medicine devoted to publishing advances in the nature, action, efficacy, and evaluation of therapeutics. CPT welcomes original Articles in the emerging areas of translational, predictive and personalized medicine; new therapeutic modalities including gene and cell therapies; pharmacogenomics, proteomics and metabolomics; bioinformation and applied systems biology complementing areas of pharmacokinetics and pharmacodynamics, human investigation and clinical trials, pharmacovigilence, pharmacoepidemiology, pharmacometrics, and population pharmacology.