{"title":"Prompting Is All You Need - Until It Isn't: Exploring the Limits of LLMs for Negation Detection in German Clinical Text.","authors":"Richard Zowalla, Martin Wiesner","doi":"10.3233/SHTI251374","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Detecting negations in clinical text is crucial for accurate documentation and decision-making.</p><p><strong>Methods: </strong>This study assesses open-source Large Language Models (LLMs) for detecting negations in German clinical discharge letters, comparing them to the rule-based approach (GeNeg) and human annotations.</p><p><strong>Results: </strong>While Llama 3.3 and Deepseek-R1 (70B) showed slight accuracy improvements, their high computational costs limit practicality compared to GeNeg. Llama 3.3 achieved the highest accuracy (.9670) and F1-score (.9620), outperforming all other models and slightly exceeding GeNeg in accuracy and F1-score. However, it required significantly more computational time (5.9 sec/sent) when compared to GeNeg's processing time (.005 sec/sent).</p><p><strong>Conclusion: </strong>The study results suggest hybrid approaches combining rule-based efficiency paired with LLMs' linguistic capabilities. In addition, future work should therefore optimize prompts and integrate LLMs with traditional methods to balance accuracy and efficiency.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"5-12"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Detecting negations in clinical text is crucial for accurate documentation and decision-making.
Methods: This study assesses open-source Large Language Models (LLMs) for detecting negations in German clinical discharge letters, comparing them to the rule-based approach (GeNeg) and human annotations.
Results: While Llama 3.3 and Deepseek-R1 (70B) showed slight accuracy improvements, their high computational costs limit practicality compared to GeNeg. Llama 3.3 achieved the highest accuracy (.9670) and F1-score (.9620), outperforming all other models and slightly exceeding GeNeg in accuracy and F1-score. However, it required significantly more computational time (5.9 sec/sent) when compared to GeNeg's processing time (.005 sec/sent).
Conclusion: The study results suggest hybrid approaches combining rule-based efficiency paired with LLMs' linguistic capabilities. In addition, future work should therefore optimize prompts and integrate LLMs with traditional methods to balance accuracy and efficiency.