{"title":"提示是所有你需要的-直到它不是:探索法学硕士的限制阴性检测在德国临床文本。","authors":"Richard Zowalla, Martin Wiesner","doi":"10.3233/SHTI251374","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Detecting negations in clinical text is crucial for accurate documentation and decision-making.</p><p><strong>Methods: </strong>This study assesses open-source Large Language Models (LLMs) for detecting negations in German clinical discharge letters, comparing them to the rule-based approach (GeNeg) and human annotations.</p><p><strong>Results: </strong>While Llama 3.3 and Deepseek-R1 (70B) showed slight accuracy improvements, their high computational costs limit practicality compared to GeNeg. Llama 3.3 achieved the highest accuracy (.9670) and F1-score (.9620), outperforming all other models and slightly exceeding GeNeg in accuracy and F1-score. However, it required significantly more computational time (5.9 sec/sent) when compared to GeNeg's processing time (.005 sec/sent).</p><p><strong>Conclusion: </strong>The study results suggest hybrid approaches combining rule-based efficiency paired with LLMs' linguistic capabilities. In addition, future work should therefore optimize prompts and integrate LLMs with traditional methods to balance accuracy and efficiency.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"5-12"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prompting Is All You Need - Until It Isn't: Exploring the Limits of LLMs for Negation Detection in German Clinical Text.\",\"authors\":\"Richard Zowalla, Martin Wiesner\",\"doi\":\"10.3233/SHTI251374\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Detecting negations in clinical text is crucial for accurate documentation and decision-making.</p><p><strong>Methods: </strong>This study assesses open-source Large Language Models (LLMs) for detecting negations in German clinical discharge letters, comparing them to the rule-based approach (GeNeg) and human annotations.</p><p><strong>Results: </strong>While Llama 3.3 and Deepseek-R1 (70B) showed slight accuracy improvements, their high computational costs limit practicality compared to GeNeg. Llama 3.3 achieved the highest accuracy (.9670) and F1-score (.9620), outperforming all other models and slightly exceeding GeNeg in accuracy and F1-score. However, it required significantly more computational time (5.9 sec/sent) when compared to GeNeg's processing time (.005 sec/sent).</p><p><strong>Conclusion: </strong>The study results suggest hybrid approaches combining rule-based efficiency paired with LLMs' linguistic capabilities. In addition, future work should therefore optimize prompts and integrate LLMs with traditional methods to balance accuracy and efficiency.</p>\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"331 \",\"pages\":\"5-12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI251374\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prompting Is All You Need - Until It Isn't: Exploring the Limits of LLMs for Negation Detection in German Clinical Text.
Introduction: Detecting negations in clinical text is crucial for accurate documentation and decision-making.
Methods: This study assesses open-source Large Language Models (LLMs) for detecting negations in German clinical discharge letters, comparing them to the rule-based approach (GeNeg) and human annotations.
Results: While Llama 3.3 and Deepseek-R1 (70B) showed slight accuracy improvements, their high computational costs limit practicality compared to GeNeg. Llama 3.3 achieved the highest accuracy (.9670) and F1-score (.9620), outperforming all other models and slightly exceeding GeNeg in accuracy and F1-score. However, it required significantly more computational time (5.9 sec/sent) when compared to GeNeg's processing time (.005 sec/sent).
Conclusion: The study results suggest hybrid approaches combining rule-based efficiency paired with LLMs' linguistic capabilities. In addition, future work should therefore optimize prompts and integrate LLMs with traditional methods to balance accuracy and efficiency.