Natthanaphop Isaradech , Andrea Riedel , Wachiranun Sirikul , Markus Kreuzthaler , Stefan Schulz
{"title":"基于大语言模型的药物处方零弹和少弹命名实体识别和文本扩展","authors":"Natthanaphop Isaradech , Andrea Riedel , Wachiranun Sirikul , Markus Kreuzthaler , Stefan Schulz","doi":"10.1016/j.artmed.2025.103165","DOIUrl":null,"url":null,"abstract":"<div><div>Medication prescriptions in electronic health records (EHR) are often in free-text and may include a mix of languages, local brand names, and a wide range of idiosyncratic formats and abbreviations. Large language models (LLMs) have shown a promising ability to generate text in response to input prompts. We use ChatGPT3.5 to automatically structure and expand medication statements in discharge summaries and thus make them easier to interpret for people and machines. Named Entity Recognition (NER) and Text Expansion (EX) are used with different prompt strategies in a zero- and few-shot setting. 100 medication statements were manually annotated and curated. NER performance was measured by using strict and partial matching. For the EX task, two experts interpreted the results by assessing semantic equivalence between original and expanded statements. The model performance was measured by precision, recall, and F1 score. For NER, the best-performing prompt reached an average F1 score of 0.94 in the test set. For EX, the few-shot prompt showed superior performance among other prompts, with an average F1 score of 0.87. Our study demonstrates good performance for NER and EX tasks in free-text medication statements using ChatGPT3.5. Compared to a zero-shot baseline, a few-shot approach prevented the system from hallucinating, which is essential when processing safety-relevant medication data. We tested ChatGPT3.5-tuned prompts on other LLMs, including ChatGPT4o, Gemini 2.0 Flash, MedLM-1.5-Large, and DeepSeekV3. The findings showed most models outperformed ChatGPT3.5 in NER and EX tasks.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"167 ","pages":"Article 103165"},"PeriodicalIF":6.2000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Zero- and few-shot Named Entity Recognition and Text Expansion in medication prescriptions using large language models\",\"authors\":\"Natthanaphop Isaradech , Andrea Riedel , Wachiranun Sirikul , Markus Kreuzthaler , Stefan Schulz\",\"doi\":\"10.1016/j.artmed.2025.103165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Medication prescriptions in electronic health records (EHR) are often in free-text and may include a mix of languages, local brand names, and a wide range of idiosyncratic formats and abbreviations. Large language models (LLMs) have shown a promising ability to generate text in response to input prompts. We use ChatGPT3.5 to automatically structure and expand medication statements in discharge summaries and thus make them easier to interpret for people and machines. Named Entity Recognition (NER) and Text Expansion (EX) are used with different prompt strategies in a zero- and few-shot setting. 100 medication statements were manually annotated and curated. NER performance was measured by using strict and partial matching. For the EX task, two experts interpreted the results by assessing semantic equivalence between original and expanded statements. The model performance was measured by precision, recall, and F1 score. For NER, the best-performing prompt reached an average F1 score of 0.94 in the test set. For EX, the few-shot prompt showed superior performance among other prompts, with an average F1 score of 0.87. Our study demonstrates good performance for NER and EX tasks in free-text medication statements using ChatGPT3.5. Compared to a zero-shot baseline, a few-shot approach prevented the system from hallucinating, which is essential when processing safety-relevant medication data. We tested ChatGPT3.5-tuned prompts on other LLMs, including ChatGPT4o, Gemini 2.0 Flash, MedLM-1.5-Large, and DeepSeekV3. The findings showed most models outperformed ChatGPT3.5 in NER and EX tasks.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"167 \",\"pages\":\"Article 103165\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365725001009\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725001009","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Zero- and few-shot Named Entity Recognition and Text Expansion in medication prescriptions using large language models
Medication prescriptions in electronic health records (EHR) are often in free-text and may include a mix of languages, local brand names, and a wide range of idiosyncratic formats and abbreviations. Large language models (LLMs) have shown a promising ability to generate text in response to input prompts. We use ChatGPT3.5 to automatically structure and expand medication statements in discharge summaries and thus make them easier to interpret for people and machines. Named Entity Recognition (NER) and Text Expansion (EX) are used with different prompt strategies in a zero- and few-shot setting. 100 medication statements were manually annotated and curated. NER performance was measured by using strict and partial matching. For the EX task, two experts interpreted the results by assessing semantic equivalence between original and expanded statements. The model performance was measured by precision, recall, and F1 score. For NER, the best-performing prompt reached an average F1 score of 0.94 in the test set. For EX, the few-shot prompt showed superior performance among other prompts, with an average F1 score of 0.87. Our study demonstrates good performance for NER and EX tasks in free-text medication statements using ChatGPT3.5. Compared to a zero-shot baseline, a few-shot approach prevented the system from hallucinating, which is essential when processing safety-relevant medication data. We tested ChatGPT3.5-tuned prompts on other LLMs, including ChatGPT4o, Gemini 2.0 Flash, MedLM-1.5-Large, and DeepSeekV3. The findings showed most models outperformed ChatGPT3.5 in NER and EX tasks.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.