Thomas Thesen, Rupa Lalchandani Tuan, Joe Blumer, Michael W Lee
{"title":"基于法学硕士的usmle式问题生成与ASPET/AMSPC知识目标:所有的穷而没有富。","authors":"Thomas Thesen, Rupa Lalchandani Tuan, Joe Blumer, Michael W Lee","doi":"10.1002/bcp.70119","DOIUrl":null,"url":null,"abstract":"<p><p>Developing high-quality pharmacology multiple-choice questions (MCQs) is challenging in large part due to continually evolving therapeutic guidelines and the complex integration of basic science and clinical medicine in this subject area. Large language models (LLMs) like ChatGPT-4 have repeatedly demonstrated proficiency in answering medical licensing exam questions, prompting interest in their use for generating high stakes exam-style questions. This study evaluates the performance of ChatGPT-4o in generating USMLE-style pharmacology questions based on American Society for Pharmacology and Experimental Therapeutics/Association of Medical School Pharmacology Chairs (ASPET/AMSPC) knowledge objectives and assesses the impact of retrieval-augmented generation (RAG) on question accuracy and quality. Using standardized prompts, 50 questions (25 RAG and 25 non-RAG) were generated and subsequently evaluated by expert reviewers. Results showed higher accuracy for non-RAG questions (88.0% vs. 69.2%), though the difference was not statistically significant. No significant differences were observed in other quality dimensions. These findings suggest that sophisticated LLMs can generate high-quality pharmacology questions efficiently without RAG, though human oversight remains crucial.</p>","PeriodicalId":9251,"journal":{"name":"British journal of clinical pharmacology","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-based generation of USMLE-style questions with ASPET/AMSPC knowledge objectives: All RAGs and no riches.\",\"authors\":\"Thomas Thesen, Rupa Lalchandani Tuan, Joe Blumer, Michael W Lee\",\"doi\":\"10.1002/bcp.70119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Developing high-quality pharmacology multiple-choice questions (MCQs) is challenging in large part due to continually evolving therapeutic guidelines and the complex integration of basic science and clinical medicine in this subject area. Large language models (LLMs) like ChatGPT-4 have repeatedly demonstrated proficiency in answering medical licensing exam questions, prompting interest in their use for generating high stakes exam-style questions. This study evaluates the performance of ChatGPT-4o in generating USMLE-style pharmacology questions based on American Society for Pharmacology and Experimental Therapeutics/Association of Medical School Pharmacology Chairs (ASPET/AMSPC) knowledge objectives and assesses the impact of retrieval-augmented generation (RAG) on question accuracy and quality. Using standardized prompts, 50 questions (25 RAG and 25 non-RAG) were generated and subsequently evaluated by expert reviewers. Results showed higher accuracy for non-RAG questions (88.0% vs. 69.2%), though the difference was not statistically significant. No significant differences were observed in other quality dimensions. These findings suggest that sophisticated LLMs can generate high-quality pharmacology questions efficiently without RAG, though human oversight remains crucial.</p>\",\"PeriodicalId\":9251,\"journal\":{\"name\":\"British journal of clinical pharmacology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British journal of clinical pharmacology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/bcp.70119\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British journal of clinical pharmacology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/bcp.70119","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
LLM-based generation of USMLE-style questions with ASPET/AMSPC knowledge objectives: All RAGs and no riches.
Developing high-quality pharmacology multiple-choice questions (MCQs) is challenging in large part due to continually evolving therapeutic guidelines and the complex integration of basic science and clinical medicine in this subject area. Large language models (LLMs) like ChatGPT-4 have repeatedly demonstrated proficiency in answering medical licensing exam questions, prompting interest in their use for generating high stakes exam-style questions. This study evaluates the performance of ChatGPT-4o in generating USMLE-style pharmacology questions based on American Society for Pharmacology and Experimental Therapeutics/Association of Medical School Pharmacology Chairs (ASPET/AMSPC) knowledge objectives and assesses the impact of retrieval-augmented generation (RAG) on question accuracy and quality. Using standardized prompts, 50 questions (25 RAG and 25 non-RAG) were generated and subsequently evaluated by expert reviewers. Results showed higher accuracy for non-RAG questions (88.0% vs. 69.2%), though the difference was not statistically significant. No significant differences were observed in other quality dimensions. These findings suggest that sophisticated LLMs can generate high-quality pharmacology questions efficiently without RAG, though human oversight remains crucial.
期刊介绍:
Published on behalf of the British Pharmacological Society, the British Journal of Clinical Pharmacology features papers and reports on all aspects of drug action in humans: review articles, mini review articles, original papers, commentaries, editorials and letters. The Journal enjoys a wide readership, bridging the gap between the medical profession, clinical research and the pharmaceutical industry. It also publishes research on new methods, new drugs and new approaches to treatment. The Journal is recognised as one of the leading publications in its field. It is online only, publishes open access research through its OnlineOpen programme and is published monthly.