Ivana Nanevski, Sebastian Jäger, Matthias Schulte-Althoff, Eva-Maria Behnke, Daniel Fürstenau, Felix Biessmann
{"title":"人工智能在护理中的潜力:跌倒风险评估的多中心评估。","authors":"Ivana Nanevski, Sebastian Jäger, Matthias Schulte-Althoff, Eva-Maria Behnke, Daniel Fürstenau, Felix Biessmann","doi":"10.2196/71034","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>With 28%-35% of individuals aged 65 years and older experiencing incidents of falling, falls are the second leading cause of unintentional injury-related deaths globally. Limited availability of clinical staff often impedes the timely detection and prevention of potential falls. Advances in artificial intelligence (AI) could complement existing fall risk assessment and help better allocate nursing care resources. Yet, many studies are based on small datasets from a single institution, which can restrict the generalizability of the model, and do not investigate important aspects in AI model development, such as fairness across demographic groups.</p><p><strong>Objective: </strong>This study aimed to provide a comprehensive empirical evaluation of the potential of AI in nursing care, focusing on the case of fall risk prediction. To account for demographic and contextual differences in fall incidences, we analyze data from a university and a geriatric hospital in Germany. To the best of our knowledge, these are the largest fall risk prediction datasets to date with heterogeneous data distributions. We focus on 3 key objectives. First, does AI help in improving fall risk prediction? Second, how can AI models be trained safely across different hospitals? Finally, are these models fair?</p><p><strong>Methods: </strong>This study used 2 datasets for fall risk prediction: one from a university hospital with 931,726 participants, 10,442 of whom experienced falls, and another from a geriatric hospital with 12,773 participants, 1728 of whom have fallen. State-of-the-art AI models were trained with 3 approaches, including 2 decentralized learning paradigms. First, separate models were trained on data from each hospital; second, models were retrained on the respective other dataset; and federated learning (FL) was applied to both datasets. The performance of these models was compared with the rule-based systems as implemented in clinical practice for fall risk prediction. Additional analyses were conducted to test for model fairness.</p><p><strong>Results: </strong>Our findings demonstrate that AI models consistently outperform rule-based systems across all experimental setups, with the area under the receiver operating characteristic curve of 0.735 (90% CI 0.727-0.744) for the geriatric hospital, and 0.926 (90% CI 0.924-0.928) for the university hospital. FL did not improve the fall risk prediction in this setting. Our fairness analysis ruled out disparities in model performance between different sex groups, but we found fairness infringements across age groups.</p><p><strong>Conclusions: </strong>This study demonstrates that AI models consistently outperform traditional rule-based systems across heterogeneous datasets in predicting fall risk. However, it also reveals the challenges related to demographic shifts and label distribution imbalances, which limited the FL models' ability to generalize. While the fairness analysis indicated fair results across sex subgroups, age-related disparities emerged. Addressing data imbalances and ensuring broader representation across demographic groups will be crucial for developing more fair and generalizable models.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e71034"},"PeriodicalIF":6.0000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Potential of AI in Nursing Care: Multicenter Evaluation in Fall Risk Assessment.\",\"authors\":\"Ivana Nanevski, Sebastian Jäger, Matthias Schulte-Althoff, Eva-Maria Behnke, Daniel Fürstenau, Felix Biessmann\",\"doi\":\"10.2196/71034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>With 28%-35% of individuals aged 65 years and older experiencing incidents of falling, falls are the second leading cause of unintentional injury-related deaths globally. Limited availability of clinical staff often impedes the timely detection and prevention of potential falls. Advances in artificial intelligence (AI) could complement existing fall risk assessment and help better allocate nursing care resources. Yet, many studies are based on small datasets from a single institution, which can restrict the generalizability of the model, and do not investigate important aspects in AI model development, such as fairness across demographic groups.</p><p><strong>Objective: </strong>This study aimed to provide a comprehensive empirical evaluation of the potential of AI in nursing care, focusing on the case of fall risk prediction. To account for demographic and contextual differences in fall incidences, we analyze data from a university and a geriatric hospital in Germany. To the best of our knowledge, these are the largest fall risk prediction datasets to date with heterogeneous data distributions. We focus on 3 key objectives. First, does AI help in improving fall risk prediction? Second, how can AI models be trained safely across different hospitals? Finally, are these models fair?</p><p><strong>Methods: </strong>This study used 2 datasets for fall risk prediction: one from a university hospital with 931,726 participants, 10,442 of whom experienced falls, and another from a geriatric hospital with 12,773 participants, 1728 of whom have fallen. State-of-the-art AI models were trained with 3 approaches, including 2 decentralized learning paradigms. First, separate models were trained on data from each hospital; second, models were retrained on the respective other dataset; and federated learning (FL) was applied to both datasets. The performance of these models was compared with the rule-based systems as implemented in clinical practice for fall risk prediction. Additional analyses were conducted to test for model fairness.</p><p><strong>Results: </strong>Our findings demonstrate that AI models consistently outperform rule-based systems across all experimental setups, with the area under the receiver operating characteristic curve of 0.735 (90% CI 0.727-0.744) for the geriatric hospital, and 0.926 (90% CI 0.924-0.928) for the university hospital. FL did not improve the fall risk prediction in this setting. Our fairness analysis ruled out disparities in model performance between different sex groups, but we found fairness infringements across age groups.</p><p><strong>Conclusions: </strong>This study demonstrates that AI models consistently outperform traditional rule-based systems across heterogeneous datasets in predicting fall risk. However, it also reveals the challenges related to demographic shifts and label distribution imbalances, which limited the FL models' ability to generalize. While the fairness analysis indicated fair results across sex subgroups, age-related disparities emerged. Addressing data imbalances and ensuring broader representation across demographic groups will be crucial for developing more fair and generalizable models.</p>\",\"PeriodicalId\":16337,\"journal\":{\"name\":\"Journal of Medical Internet Research\",\"volume\":\"27 \",\"pages\":\"e71034\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Internet Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/71034\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/71034","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
摘要
背景:在65岁及以上的老年人中,有28%-35%的人经历过跌倒事件,跌倒是全球非故意伤害相关死亡的第二大原因。有限的临床工作人员往往妨碍及时发现和预防潜在的跌倒。人工智能(AI)的进步可以补充现有的跌倒风险评估,帮助更好地分配护理资源。然而,许多研究都是基于来自单一机构的小数据集,这可能会限制模型的泛化性,并且没有调查人工智能模型开发中的重要方面,例如人口群体之间的公平性。目的:本研究旨在对人工智能在护理中的潜力进行全面的实证评估,重点是跌倒风险预测的案例。为了解释跌倒发生率的人口统计学和背景差异,我们分析了德国一所大学和一家老年医院的数据。据我们所知,这些是迄今为止最大的具有异构数据分布的跌倒风险预测数据集。我们专注于三个关键目标。首先,人工智能是否有助于改善跌倒风险预测?第二,如何在不同的医院安全地训练人工智能模型?最后,这些模型公平吗?方法:本研究使用2个数据集进行跌倒风险预测:一个来自大学医院,有931,726名参与者,其中10,442人有跌倒经历;另一个来自老年医院,有12,773名参与者,其中1728人有跌倒经历。最先进的人工智能模型使用3种方法进行训练,包括2种分散的学习范式。首先,对每个医院的数据进行单独的模型训练;其次,在各自的其他数据集上对模型进行再训练;并将联邦学习(FL)应用于两个数据集。将这些模型的性能与临床实践中实施的基于规则的跌倒风险预测系统进行比较。进行了额外的分析来测试模型的公平性。结果:我们的研究结果表明,人工智能模型在所有实验设置中始终优于基于规则的系统,老年医院的接受者工作特征曲线下面积为0.735 (90% CI 0.727-0.744),大学医院的接受者工作特征曲线下面积为0.926 (90% CI 0.924-0.928)。在这种情况下,FL并没有提高跌倒风险预测。我们的公平性分析排除了不同性别群体之间模型表现的差异,但我们发现了跨年龄组的公平性侵权。结论:本研究表明,人工智能模型在预测跌倒风险方面始终优于传统的基于规则的系统。然而,它也揭示了与人口变化和标签分布不平衡相关的挑战,这限制了FL模型的泛化能力。虽然公平分析表明性别亚组的结果是公平的,但与年龄相关的差异也出现了。解决数据不平衡问题并确保在人口群体中有更广泛的代表性,对于开发更公平和可推广的模型至关重要。
The Potential of AI in Nursing Care: Multicenter Evaluation in Fall Risk Assessment.
Background: With 28%-35% of individuals aged 65 years and older experiencing incidents of falling, falls are the second leading cause of unintentional injury-related deaths globally. Limited availability of clinical staff often impedes the timely detection and prevention of potential falls. Advances in artificial intelligence (AI) could complement existing fall risk assessment and help better allocate nursing care resources. Yet, many studies are based on small datasets from a single institution, which can restrict the generalizability of the model, and do not investigate important aspects in AI model development, such as fairness across demographic groups.
Objective: This study aimed to provide a comprehensive empirical evaluation of the potential of AI in nursing care, focusing on the case of fall risk prediction. To account for demographic and contextual differences in fall incidences, we analyze data from a university and a geriatric hospital in Germany. To the best of our knowledge, these are the largest fall risk prediction datasets to date with heterogeneous data distributions. We focus on 3 key objectives. First, does AI help in improving fall risk prediction? Second, how can AI models be trained safely across different hospitals? Finally, are these models fair?
Methods: This study used 2 datasets for fall risk prediction: one from a university hospital with 931,726 participants, 10,442 of whom experienced falls, and another from a geriatric hospital with 12,773 participants, 1728 of whom have fallen. State-of-the-art AI models were trained with 3 approaches, including 2 decentralized learning paradigms. First, separate models were trained on data from each hospital; second, models were retrained on the respective other dataset; and federated learning (FL) was applied to both datasets. The performance of these models was compared with the rule-based systems as implemented in clinical practice for fall risk prediction. Additional analyses were conducted to test for model fairness.
Results: Our findings demonstrate that AI models consistently outperform rule-based systems across all experimental setups, with the area under the receiver operating characteristic curve of 0.735 (90% CI 0.727-0.744) for the geriatric hospital, and 0.926 (90% CI 0.924-0.928) for the university hospital. FL did not improve the fall risk prediction in this setting. Our fairness analysis ruled out disparities in model performance between different sex groups, but we found fairness infringements across age groups.
Conclusions: This study demonstrates that AI models consistently outperform traditional rule-based systems across heterogeneous datasets in predicting fall risk. However, it also reveals the challenges related to demographic shifts and label distribution imbalances, which limited the FL models' ability to generalize. While the fairness analysis indicated fair results across sex subgroups, age-related disparities emerged. Addressing data imbalances and ensuring broader representation across demographic groups will be crucial for developing more fair and generalizable models.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.