Evaluating Locally Run Large Language Models for Obstructive Sleep Apnea Diagnosis and Treatment: A Real-World Polysomnography Study.

IF 3.4 2区医学 Q2 CLINICAL NEUROLOGY

Nature and Science of Sleep Pub Date : 2025-07-08 eCollection Date: 2025-01-01 DOI:10.2147/NSS.S536823

Christopher Seifen, Tilman Huppertz, Katharina Bahr-Hamm, Haralampos Gouveris, Johannes Pordzik, Jonas Eckrich, Christoph Matthias, Harry Smith, Tom Kelsey, Andrew Blaikie, Sebastian Kuhn, Christoph Raphael Buhr

{"title":"Evaluating Locally Run Large Language Models for Obstructive Sleep Apnea Diagnosis and Treatment: A Real-World Polysomnography Study.","authors":"Christopher Seifen, Tilman Huppertz, Katharina Bahr-Hamm, Haralampos Gouveris, Johannes Pordzik, Jonas Eckrich, Christoph Matthias, Harry Smith, Tom Kelsey, Andrew Blaikie, Sebastian Kuhn, Christoph Raphael Buhr","doi":"10.2147/NSS.S536823","DOIUrl":null,"url":null,"abstract":"Purpose: Sleep medicine is a highly resource-intensive field where large language models (LLMs) could offer a promising solution by supporting diagnostic processes. As web-based LLMs have obvious data protection constraints, locally run LLMs are essential for clinical implementation. This study is the first to investigate the performance of locally run LLMs in the interpretation of real-world polysomnographic (PSG) results.Methods: We randomly selected N=30 patients (18 male, 12 female, mean age 50.5 ± 11.1 years, mean body mass index 29.7 ± 5.5 kg/m², mean apnea hypopnea index 30.9 ± 23.8) from the clinical database of our sleep laboratory who underwent PSG due to clinical complaints typical of obstructive sleep apnea (OSA). The board-certified sleep physician's interpretations of diagnosis, suitable first-line therapy or alternative therapy were compared with those of three locally run LLMs (Gemma2, Llama3 and Mistral Nemo) assessing the level of concordance.Results: Gemma2 showed the lowest concordance of 33% (10/30 patients) with the board-certified sleep physician regarding OSA severity, followed by Mistral Nemo at 47% (14/30 patients) and Llama3 at 50% (15/30 patients). For automatic positive airway pressure (aPAP) recommendations, Mistral Nemo showed the highest concordance at 90% (27/30 patients), followed by Gemma2 and Llama3 with 83% (25/30 patients) each.Conclusion: Although locally run LLMs bypass data security constraints and show promising potential for clinical practice, their performance needs significant improvement prior to real-world implementation. Therefore, at present, the routine implementation of locally run LLMs in sleep medicine needs more refinement and fine tuning before they can be used for interpretation of real-world PSG results.","PeriodicalId":18896,"journal":{"name":"Nature and Science of Sleep","volume":"17 ","pages":"1587-1599"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12257169/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature and Science of Sleep","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/NSS.S536823","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Sleep medicine is a highly resource-intensive field where large language models (LLMs) could offer a promising solution by supporting diagnostic processes. As web-based LLMs have obvious data protection constraints, locally run LLMs are essential for clinical implementation. This study is the first to investigate the performance of locally run LLMs in the interpretation of real-world polysomnographic (PSG) results.

Methods: We randomly selected N=30 patients (18 male, 12 female, mean age 50.5 ± 11.1 years, mean body mass index 29.7 ± 5.5 kg/m², mean apnea hypopnea index 30.9 ± 23.8) from the clinical database of our sleep laboratory who underwent PSG due to clinical complaints typical of obstructive sleep apnea (OSA). The board-certified sleep physician's interpretations of diagnosis, suitable first-line therapy or alternative therapy were compared with those of three locally run LLMs (Gemma2, Llama3 and Mistral Nemo) assessing the level of concordance.

Results: Gemma2 showed the lowest concordance of 33% (10/30 patients) with the board-certified sleep physician regarding OSA severity, followed by Mistral Nemo at 47% (14/30 patients) and Llama3 at 50% (15/30 patients). For automatic positive airway pressure (aPAP) recommendations, Mistral Nemo showed the highest concordance at 90% (27/30 patients), followed by Gemma2 and Llama3 with 83% (25/30 patients) each.

Conclusion: Although locally run LLMs bypass data security constraints and show promising potential for clinical practice, their performance needs significant improvement prior to real-world implementation. Therefore, at present, the routine implementation of locally run LLMs in sleep medicine needs more refinement and fine tuning before they can be used for interpretation of real-world PSG results.

查看原文本刊更多论文

评估阻塞性睡眠呼吸暂停诊断和治疗的局部运行大语言模型：一项真实世界的多导睡眠图研究。

目的：睡眠医学是一个资源高度密集的领域，大型语言模型（LLMs）可以通过支持诊断过程提供一个有前途的解决方案。由于基于web的llm有明显的数据保护限制，本地运行的llm对于临床实施至关重要。本研究首次调查了本地运行的llm在解释真实世界多导睡眠图（PSG）结果中的表现。方法：从我院睡眠实验室临床数据库中随机抽取30例以阻塞性睡眠呼吸暂停（OSA）为主诉，行PSG治疗的患者，其中男性18例，女性12例，平均年龄50.5±11.1岁，平均体重指数29.7±5.5 kg/m²，平均呼吸暂停低通气指数30.9±23.8。将委员会认证的睡眠医生对诊断、合适的一线治疗或替代治疗的解释与三个本地运行的llm （Gemma2、Llama3和Mistral Nemo）的解释进行比较，评估一致性水平。结果：Gemma2与委员会认证的睡眠医生在OSA严重程度方面的一致性最低，为33%（10/30例），其次是Mistral Nemo，为47%（14/30例），Llama3为50%（15/30例）。对于自动气道正压（aPAP）推荐，Mistral Nemo的一致性最高，为90%（27/30例患者），其次是Gemma2和Llama3，各为83%（25/30例患者）。结论：尽管本地运行的llm绕过了数据安全限制，在临床实践中显示出良好的潜力，但在实际应用之前，它们的性能需要显著提高。因此，目前，在睡眠医学中，本地运行的llm的常规实施需要更多的完善和微调，才能用于解释现实世界的PSG结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature and Science of Sleep Neuroscience-Behavioral Neuroscience

CiteScore

5.70

自引率

5.90%

发文量

245

审稿时长

16 weeks

期刊介绍： Nature and Science of Sleep is an international, peer-reviewed, open access journal covering all aspects of sleep science and sleep medicine, including the neurophysiology and functions of sleep, the genetics of sleep, sleep and society, biological rhythms, dreaming, sleep disorders and therapy, and strategies to optimize healthy sleep. Specific topics covered in the journal include: The functions of sleep in humans and other animals Physiological and neurophysiological changes with sleep The genetics of sleep and sleep differences The neurotransmitters, receptors and pathways involved in controlling both sleep and wakefulness Behavioral and pharmacological interventions aimed at improving sleep, and improving wakefulness Sleep changes with development and with age Sleep and reproduction (e.g., changes across the menstrual cycle, with pregnancy and menopause) The science and nature of dreams Sleep disorders Impact of sleep and sleep disorders on health, daytime function and quality of life Sleep problems secondary to clinical disorders Interaction of society with sleep (e.g., consequences of shift work, occupational health, public health) The microbiome and sleep Chronotherapy Impact of circadian rhythms on sleep, physiology, cognition and health Mechanisms controlling circadian rhythms, centrally and peripherally Impact of circadian rhythm disruptions (including night shift work, jet lag and social jet lag) on sleep, physiology, cognition and health Behavioral and pharmacological interventions aimed at reducing adverse effects of circadian-related sleep disruption Assessment of technologies and biomarkers for measuring sleep and/or circadian rhythms Epigenetic markers of sleep or circadian disruption.