Leveraging large language models for patient-ventilator asynchrony detection.

IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES
Francesc Suñol, Candelaria de Haro, Verónica Santos-Pulpón, Sol Fernández-Gonzalo, Lluís Blanch, Josefina López-Aguilar, Leonardo Sarlabous
{"title":"Leveraging large language models for patient-ventilator asynchrony detection.","authors":"Francesc Suñol, Candelaria de Haro, Verónica Santos-Pulpón, Sol Fernández-Gonzalo, Lluís Blanch, Josefina López-Aguilar, Leonardo Sarlabous","doi":"10.1136/bmjhci-2024-101426","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this study is to evaluate whether large language models (LLMs) can achieve performance comparable to expert-developed deep neural networks in detecting flow starvation (FS) asynchronies during mechanical ventilation.</p><p><strong>Methods: </strong>Popular LLMs (GPT-4, Claude-3.5, Gemini-1.5, DeepSeek-R1) were tested on a dataset of 6500 airway pressure cycles from 28 patients, classifying breaths into three FS categories. They were also tasked with generating executable code for one-dimensional convolutional neural network (CNN-1D) and Long Short-Term Memory networks. Model performances were assessed using repeated holdout validation and compared with expert-developed models.</p><p><strong>Results: </strong>LLMs performed poorly in direct FS classification (accuracy: GPT-4: 0.497; Claude-3.5: 0.627; Gemini-1.5: 0.544, DeepSeek-R1: 0.520). However, Claude-3.5-generated CNN-1D code achieved the highest accuracy (0.902 (0.899-0.906)), outperforming expert-developed models.</p><p><strong>Discussion: </strong>LLMs demonstrated limited capability in direct classification but excelled in generating effective neural network models with minimal human intervention. This suggests LLMs' potential in accelerating model development for clinical applications, particularly for detecting patient-ventilator asynchronies, though their clinical implementation requires further validation and consideration of ethical factors.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207101/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: The objective of this study is to evaluate whether large language models (LLMs) can achieve performance comparable to expert-developed deep neural networks in detecting flow starvation (FS) asynchronies during mechanical ventilation.

Methods: Popular LLMs (GPT-4, Claude-3.5, Gemini-1.5, DeepSeek-R1) were tested on a dataset of 6500 airway pressure cycles from 28 patients, classifying breaths into three FS categories. They were also tasked with generating executable code for one-dimensional convolutional neural network (CNN-1D) and Long Short-Term Memory networks. Model performances were assessed using repeated holdout validation and compared with expert-developed models.

Results: LLMs performed poorly in direct FS classification (accuracy: GPT-4: 0.497; Claude-3.5: 0.627; Gemini-1.5: 0.544, DeepSeek-R1: 0.520). However, Claude-3.5-generated CNN-1D code achieved the highest accuracy (0.902 (0.899-0.906)), outperforming expert-developed models.

Discussion: LLMs demonstrated limited capability in direct classification but excelled in generating effective neural network models with minimal human intervention. This suggests LLMs' potential in accelerating model development for clinical applications, particularly for detecting patient-ventilator asynchronies, though their clinical implementation requires further validation and consideration of ethical factors.

Abstract Image

利用大型语言模型进行患者-呼吸机异步检测。
目的:本研究的目的是评估大型语言模型(LLMs)在检测机械通气期间的流量饥饿(FS)异步方面是否能达到与专家开发的深度神经网络相当的性能。方法:在28例患者的6500个气道压力周期数据集上测试流行的LLMs (GPT-4、Claude-3.5、Gemini-1.5、DeepSeek-R1),将呼吸分为三种FS类别。他们还被要求为一维卷积神经网络(CNN-1D)和长短期记忆网络生成可执行代码。模型的性能评估使用重复持牌验证,并与专家开发的模型进行比较。结果:LLMs在FS直接分类中表现不佳(准确率:GPT-4: 0.497;克劳德- 3.5:0.627;Gemini-1.5: 0.544, DeepSeek-R1: 0.520)。然而,claude -3.5生成的CNN-1D代码达到了最高的精度(0.902(0.899-0.906)),优于专家开发的模型。讨论:llm在直接分类方面表现出有限的能力,但在以最少的人为干预生成有效的神经网络模型方面表现出色。这表明llm在加速临床应用模型开发方面的潜力,特别是在检测患者-呼吸机异步方面,尽管它们的临床实施需要进一步验证和考虑伦理因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
4.90%
发文量
40
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信