Anna Flamigni, Giulia Zamagni, Gilda Paternuosto, Anna Arbo
{"title":"儿科罕见病:大语言模型能协助标签外处方吗?","authors":"Anna Flamigni, Giulia Zamagni, Gilda Paternuosto, Anna Arbo","doi":"10.1002/bcp.70168","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>To evaluate the effectiveness and reliability of large language models (LLMs) in retrieving and synthesizing biomedical information to support off-label drug prescribing in paediatric rare diseases, and to compare their performance with human-authored references in terms of scientific rationale, adverse events and drug interactions.</p><p><strong>Methods: </strong>The study reviewed 20 cases of off-label prescriptions in rare paediatric diseases using 4 LLMs (i.e., GPT-4o, Sophos-2, Claude-3, Scopus AI). The queries addressed focused on scientific rationale, adverse events and drug interactions. The performance measures encompassed sensitivity, precision, accuracy, F1-score, response quality and reference quality. A Global Performance Score integrated all measures.</p><p><strong>Results: </strong>After evaluating 2758 references and 480 responses, a significant discrepancy was found among 4 LLMs concerning Global Performance Score (P = .001). Posthoc analysis showed that Scopus AI vs. GPT-4o comparison was significant, with GPT-4o showing higher values. Median LLM reference quality often surpassed human performance, yet variability limits conclusions regarding superiority.</p><p><strong>Conclusions: </strong>LLMs are capable of retrieving and synthesizing biomedical information, but performance varies depending on query type and search mode. These tools speed up retrieving relevant information to assess off-label prescribing appropriateness. Despite the promise of artificial intelligence, human oversight remains critical to ensure data accuracy and reliability.</p>","PeriodicalId":9251,"journal":{"name":"British journal of clinical pharmacology","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Paediatric rare diseases: Can large language models assist off-label prescribing?\",\"authors\":\"Anna Flamigni, Giulia Zamagni, Gilda Paternuosto, Anna Arbo\",\"doi\":\"10.1002/bcp.70168\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aims: </strong>To evaluate the effectiveness and reliability of large language models (LLMs) in retrieving and synthesizing biomedical information to support off-label drug prescribing in paediatric rare diseases, and to compare their performance with human-authored references in terms of scientific rationale, adverse events and drug interactions.</p><p><strong>Methods: </strong>The study reviewed 20 cases of off-label prescriptions in rare paediatric diseases using 4 LLMs (i.e., GPT-4o, Sophos-2, Claude-3, Scopus AI). The queries addressed focused on scientific rationale, adverse events and drug interactions. The performance measures encompassed sensitivity, precision, accuracy, F1-score, response quality and reference quality. A Global Performance Score integrated all measures.</p><p><strong>Results: </strong>After evaluating 2758 references and 480 responses, a significant discrepancy was found among 4 LLMs concerning Global Performance Score (P = .001). Posthoc analysis showed that Scopus AI vs. GPT-4o comparison was significant, with GPT-4o showing higher values. Median LLM reference quality often surpassed human performance, yet variability limits conclusions regarding superiority.</p><p><strong>Conclusions: </strong>LLMs are capable of retrieving and synthesizing biomedical information, but performance varies depending on query type and search mode. These tools speed up retrieving relevant information to assess off-label prescribing appropriateness. Despite the promise of artificial intelligence, human oversight remains critical to ensure data accuracy and reliability.</p>\",\"PeriodicalId\":9251,\"journal\":{\"name\":\"British journal of clinical pharmacology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"British journal of clinical pharmacology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/bcp.70168\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"British journal of clinical pharmacology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/bcp.70168","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
Paediatric rare diseases: Can large language models assist off-label prescribing?
Aims: To evaluate the effectiveness and reliability of large language models (LLMs) in retrieving and synthesizing biomedical information to support off-label drug prescribing in paediatric rare diseases, and to compare their performance with human-authored references in terms of scientific rationale, adverse events and drug interactions.
Methods: The study reviewed 20 cases of off-label prescriptions in rare paediatric diseases using 4 LLMs (i.e., GPT-4o, Sophos-2, Claude-3, Scopus AI). The queries addressed focused on scientific rationale, adverse events and drug interactions. The performance measures encompassed sensitivity, precision, accuracy, F1-score, response quality and reference quality. A Global Performance Score integrated all measures.
Results: After evaluating 2758 references and 480 responses, a significant discrepancy was found among 4 LLMs concerning Global Performance Score (P = .001). Posthoc analysis showed that Scopus AI vs. GPT-4o comparison was significant, with GPT-4o showing higher values. Median LLM reference quality often surpassed human performance, yet variability limits conclusions regarding superiority.
Conclusions: LLMs are capable of retrieving and synthesizing biomedical information, but performance varies depending on query type and search mode. These tools speed up retrieving relevant information to assess off-label prescribing appropriateness. Despite the promise of artificial intelligence, human oversight remains critical to ensure data accuracy and reliability.
期刊介绍:
Published on behalf of the British Pharmacological Society, the British Journal of Clinical Pharmacology features papers and reports on all aspects of drug action in humans: review articles, mini review articles, original papers, commentaries, editorials and letters. The Journal enjoys a wide readership, bridging the gap between the medical profession, clinical research and the pharmaceutical industry. It also publishes research on new methods, new drugs and new approaches to treatment. The Journal is recognised as one of the leading publications in its field. It is online only, publishes open access research through its OnlineOpen programme and is published monthly.