Gerard Ompad, Keele Wurst, Darmendra Ramcharran, Anders Hviid, Andrew Bate, Maurizio Sessa
{"title":"用于指导药物流行病学研究设计的现成大型语言模型。","authors":"Gerard Ompad, Keele Wurst, Darmendra Ramcharran, Anders Hviid, Andrew Bate, Maurizio Sessa","doi":"10.1002/cpt.70039","DOIUrl":null,"url":null,"abstract":"<p><p>This study aimed to assess the ability of two off-the-shelf large language models, ChatGPT and Gemini, to support the design of pharmacoepidemiological studies. We assessed 48 study protocols of pharmacoepidemiological studies published between 2018 and 2024, covering various study types, including disease epidemiology, drug utilization, safety, and effectiveness. The coherence (i.e., \"Is the response coherent and well-formed, or is it difficult to understand?\") and relevance (i.e., \"Is the response relevant and informative, or is it lacking in substance?\") of the large language models' responses were evaluated by human experts across seven key study design components. Coding accuracy was assessed. Both large language models demonstrated high coherence, with over 90% of study components rated as \"Strongly agree\" by experts for most categories. ChatGPT achieved the highest coherence for \"Index date\" (97.9%) and \"Study design\" (95.8%). Gemini excelled in \"Study outcome\" (93.9%) and \"Study exposure\" (95.9%). Relevance, however, was more variable, with ChatGPT aligning with expert recommendations in over 90% of cases for \"Index date\" and \"Study design\" but showing lower agreement for covariates (65%) and follow-up (70%). Coding agreement percentages reveal varying levels of concordance, with the Anatomical Therapeutic Chemical classification system coding system demonstrating the highest agreement at 50% with experts. In contrast, the Current Procedural Terminology and International Classification of Diseases systems showed agreements of 22.2% and 20%, respectively. While ChatGPT and Gemini show promise in certain tasks supporting pharmacoepidemiological study design, their limitations in relevance and coding accuracy highlight the need for critical oversight by domain experts.</p>","PeriodicalId":153,"journal":{"name":"Clinical Pharmacology & Therapeutics","volume":" ","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Off-the-Shelf Large Language Models for Guiding Pharmacoepidemiological Study Design.\",\"authors\":\"Gerard Ompad, Keele Wurst, Darmendra Ramcharran, Anders Hviid, Andrew Bate, Maurizio Sessa\",\"doi\":\"10.1002/cpt.70039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study aimed to assess the ability of two off-the-shelf large language models, ChatGPT and Gemini, to support the design of pharmacoepidemiological studies. We assessed 48 study protocols of pharmacoepidemiological studies published between 2018 and 2024, covering various study types, including disease epidemiology, drug utilization, safety, and effectiveness. The coherence (i.e., \\\"Is the response coherent and well-formed, or is it difficult to understand?\\\") and relevance (i.e., \\\"Is the response relevant and informative, or is it lacking in substance?\\\") of the large language models' responses were evaluated by human experts across seven key study design components. Coding accuracy was assessed. Both large language models demonstrated high coherence, with over 90% of study components rated as \\\"Strongly agree\\\" by experts for most categories. ChatGPT achieved the highest coherence for \\\"Index date\\\" (97.9%) and \\\"Study design\\\" (95.8%). Gemini excelled in \\\"Study outcome\\\" (93.9%) and \\\"Study exposure\\\" (95.9%). Relevance, however, was more variable, with ChatGPT aligning with expert recommendations in over 90% of cases for \\\"Index date\\\" and \\\"Study design\\\" but showing lower agreement for covariates (65%) and follow-up (70%). Coding agreement percentages reveal varying levels of concordance, with the Anatomical Therapeutic Chemical classification system coding system demonstrating the highest agreement at 50% with experts. In contrast, the Current Procedural Terminology and International Classification of Diseases systems showed agreements of 22.2% and 20%, respectively. While ChatGPT and Gemini show promise in certain tasks supporting pharmacoepidemiological study design, their limitations in relevance and coding accuracy highlight the need for critical oversight by domain experts.</p>\",\"PeriodicalId\":153,\"journal\":{\"name\":\"Clinical Pharmacology & Therapeutics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Pharmacology & Therapeutics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/cpt.70039\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Pharmacology & Therapeutics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/cpt.70039","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
Off-the-Shelf Large Language Models for Guiding Pharmacoepidemiological Study Design.
This study aimed to assess the ability of two off-the-shelf large language models, ChatGPT and Gemini, to support the design of pharmacoepidemiological studies. We assessed 48 study protocols of pharmacoepidemiological studies published between 2018 and 2024, covering various study types, including disease epidemiology, drug utilization, safety, and effectiveness. The coherence (i.e., "Is the response coherent and well-formed, or is it difficult to understand?") and relevance (i.e., "Is the response relevant and informative, or is it lacking in substance?") of the large language models' responses were evaluated by human experts across seven key study design components. Coding accuracy was assessed. Both large language models demonstrated high coherence, with over 90% of study components rated as "Strongly agree" by experts for most categories. ChatGPT achieved the highest coherence for "Index date" (97.9%) and "Study design" (95.8%). Gemini excelled in "Study outcome" (93.9%) and "Study exposure" (95.9%). Relevance, however, was more variable, with ChatGPT aligning with expert recommendations in over 90% of cases for "Index date" and "Study design" but showing lower agreement for covariates (65%) and follow-up (70%). Coding agreement percentages reveal varying levels of concordance, with the Anatomical Therapeutic Chemical classification system coding system demonstrating the highest agreement at 50% with experts. In contrast, the Current Procedural Terminology and International Classification of Diseases systems showed agreements of 22.2% and 20%, respectively. While ChatGPT and Gemini show promise in certain tasks supporting pharmacoepidemiological study design, their limitations in relevance and coding accuracy highlight the need for critical oversight by domain experts.
期刊介绍:
Clinical Pharmacology & Therapeutics (CPT) is the authoritative cross-disciplinary journal in experimental and clinical medicine devoted to publishing advances in the nature, action, efficacy, and evaluation of therapeutics. CPT welcomes original Articles in the emerging areas of translational, predictive and personalized medicine; new therapeutic modalities including gene and cell therapies; pharmacogenomics, proteomics and metabolomics; bioinformation and applied systems biology complementing areas of pharmacokinetics and pharmacodynamics, human investigation and clinical trials, pharmacovigilence, pharmacoepidemiology, pharmacometrics, and population pharmacology.