Prompting Is All You Need - Until It Isn't: Exploring the Limits of LLMs for Negation Detection in German Clinical Text.

Studies in health technology and informatics Pub Date : 2025-09-03 DOI:10.3233/SHTI251374

Richard Zowalla, Martin Wiesner

{"title":"Prompting Is All You Need - Until It Isn't: Exploring the Limits of LLMs for Negation Detection in German Clinical Text.","authors":"Richard Zowalla, Martin Wiesner","doi":"10.3233/SHTI251374","DOIUrl":null,"url":null,"abstract":"Introduction: Detecting negations in clinical text is crucial for accurate documentation and decision-making.Methods: This study assesses open-source Large Language Models (LLMs) for detecting negations in German clinical discharge letters, comparing them to the rule-based approach (GeNeg) and human annotations.Results: While Llama 3.3 and Deepseek-R1 (70B) showed slight accuracy improvements, their high computational costs limit practicality compared to GeNeg. Llama 3.3 achieved the highest accuracy (.9670) and F1-score (.9620), outperforming all other models and slightly exceeding GeNeg in accuracy and F1-score. However, it required significantly more computational time (5.9 sec/sent) when compared to GeNeg's processing time (.005 sec/sent).Conclusion: The study results suggest hybrid approaches combining rule-based efficiency paired with LLMs' linguistic capabilities. In addition, future work should therefore optimize prompts and integrate LLMs with traditional methods to balance accuracy and efficiency.","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"5-12"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Detecting negations in clinical text is crucial for accurate documentation and decision-making.

Methods: This study assesses open-source Large Language Models (LLMs) for detecting negations in German clinical discharge letters, comparing them to the rule-based approach (GeNeg) and human annotations.

Results: While Llama 3.3 and Deepseek-R1 (70B) showed slight accuracy improvements, their high computational costs limit practicality compared to GeNeg. Llama 3.3 achieved the highest accuracy (.9670) and F1-score (.9620), outperforming all other models and slightly exceeding GeNeg in accuracy and F1-score. However, it required significantly more computational time (5.9 sec/sent) when compared to GeNeg's processing time (.005 sec/sent).

Conclusion: The study results suggest hybrid approaches combining rule-based efficiency paired with LLMs' linguistic capabilities. In addition, future work should therefore optimize prompts and integrate LLMs with traditional methods to balance accuracy and efficiency.

查看原文本刊更多论文

提示是所有你需要的-直到它不是：探索法学硕士的限制阴性检测在德国临床文本。

在临床文本中检测否定对于准确的文档和决策是至关重要的。方法：本研究评估了开源大型语言模型（LLMs）用于检测德语临床出院信中的否定，并将其与基于规则的方法（GeNeg）和人工注释进行比较。结果：虽然Llama 3.3和Deepseek-R1 （70B）的准确性略有提高，但与GeNeg相比，它们的高计算成本限制了实用性。羊驼3.3获得了最高的准确率（0.9670）和f1得分（0.96）。9620)，优于所有其他模型，在准确性和f1分数上略高于geng。然而，与GeNeg的处理时间（5.9秒/发送）相比，它需要更多的计算时间（5.9秒/发送）。005秒/发送)。结论：研究结果建议将基于规则的效率与法学硕士的语言能力相结合的混合方法。此外，未来的工作应优化提示，并将llm与传统方法相结合，以平衡准确性和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Studies in health technology and informatics

自引率

0.00%

发文量