Sleep apnea test prediction based on Electronic Health Records.

IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Lama Abu Tahoun, Amit Shay Green, Tal Patalon, Yaron Dagan, Robert Moskovitch
{"title":"Sleep apnea test prediction based on Electronic Health Records.","authors":"Lama Abu Tahoun, Amit Shay Green, Tal Patalon, Yaron Dagan, Robert Moskovitch","doi":"10.1016/j.jbi.2024.104737","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of Obstructive Sleep Apnea (OSA) is done by a Polysomnography test which is often done in later ages. Being able to notify potential insured members at earlier ages is desirable. For that, we develop predictive models that rely on Electronic Health Records (EHR) and predict whether a person will go through a sleep apnea test after the age of 50. A major challenge is the variability in EHR records in various insured members over the years, which this study investigates as well in the context of controls matching, and prediction. Since there are many temporal variables, the RankLi method was introduced for temporal variable selection. This approach employs the t-test to calculate a divergence score for each temporal variable between the target classes. We also investigate here the need to consider the number of EHR records, as part of control matching, and whether modeling separately for subgroups according to the number of EHR records is more effective. For each prediction task, we trained 4 different classifiers including 1-CNN, LSTM, Random Forest, and Logistic Regression, on data until the age of 40 or 50, and on several numbers of temporal variables. Using the number of EHR records for control matching was found crucial, and using learning models for subsets of the population according to the number of EHR records they have was found more effective. The deep learning models, particularly the 1-CNN, achieved the highest balanced accuracy and AUC scores in both male and female groups. In the male group, the highest results were also observed at age 50 with 100 temporal variables, resulting in a balanced accuracy of 90% and an AUC of 93%.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104737"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jbi.2024.104737","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The identification of Obstructive Sleep Apnea (OSA) is done by a Polysomnography test which is often done in later ages. Being able to notify potential insured members at earlier ages is desirable. For that, we develop predictive models that rely on Electronic Health Records (EHR) and predict whether a person will go through a sleep apnea test after the age of 50. A major challenge is the variability in EHR records in various insured members over the years, which this study investigates as well in the context of controls matching, and prediction. Since there are many temporal variables, the RankLi method was introduced for temporal variable selection. This approach employs the t-test to calculate a divergence score for each temporal variable between the target classes. We also investigate here the need to consider the number of EHR records, as part of control matching, and whether modeling separately for subgroups according to the number of EHR records is more effective. For each prediction task, we trained 4 different classifiers including 1-CNN, LSTM, Random Forest, and Logistic Regression, on data until the age of 40 or 50, and on several numbers of temporal variables. Using the number of EHR records for control matching was found crucial, and using learning models for subsets of the population according to the number of EHR records they have was found more effective. The deep learning models, particularly the 1-CNN, achieved the highest balanced accuracy and AUC scores in both male and female groups. In the male group, the highest results were also observed at age 50 with 100 temporal variables, resulting in a balanced accuracy of 90% and an AUC of 93%.

基于电子健康记录的睡眠呼吸暂停测试预测。
阻塞性睡眠呼吸暂停(OSA)是通过多导睡眠图检查来确定的,通常在晚年进行。我们希望能够在潜在投保人较早的年龄就通知他们。为此,我们开发了依赖电子健康记录(EHR)的预测模型,预测一个人是否会在 50 岁以后接受睡眠呼吸暂停测试。一个主要的挑战是不同参保人员多年来的电子健康记录存在差异,本研究在对照匹配和预测方面也对此进行了调查。由于存在许多时间变量,因此引入了 RankLi 方法来选择时间变量。这种方法采用 t 检验来计算目标类别之间每个时间变量的分歧分值。在此,我们还研究了作为控制匹配的一部分,是否需要考虑电子病历记录的数量,以及根据电子病历记录的数量为亚组单独建模是否更有效。针对每项预测任务,我们在 40 岁或 50 岁之前的数据和多个时间变量上训练了 4 种不同的分类器,包括 1-CNN、LSTM、随机森林和逻辑回归。我们发现,使用电子病历记录数量进行对照匹配至关重要,而根据电子病历记录数量对人群子集使用学习模型则更为有效。在男性组和女性组中,深度学习模型,尤其是 1-CNN 获得了最高的平衡准确率和 AUC 分数。在男性组中,50 岁时的结果也是最高的,有 100 个时间变量,平衡准确率为 90%,AUC 为 93%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Biomedical Informatics
Journal of Biomedical Informatics 医学-计算机:跨学科应用
CiteScore
8.90
自引率
6.70%
发文量
243
审稿时长
32 days
期刊介绍: The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信