Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease.

IF 4.1 Q1 HEALTH CARE SCIENCES & SERVICES
Joshua William Spear, Eleni Pissaridou, Stuart Bowyer, William A Bryant, Daniel Key, John Booth, Anastasia Spiridou, Spiros Denaxas, Rebecca Pope, Andrew M Taylor, Harry Hemingway, Neil J Sebire
{"title":"Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease.","authors":"Joshua William Spear, Eleni Pissaridou, Stuart Bowyer, William A Bryant, Daniel Key, John Booth, Anastasia Spiridou, Spiros Denaxas, Rebecca Pope, Andrew M Taylor, Harry Hemingway, Neil J Sebire","doi":"10.1136/bmjhci-2023-100963","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.</p><p><strong>Methods: </strong>Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.</p><p><strong>Findings: </strong>Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.</p><p><strong>Conclusion: </strong>Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":null,"pages":null},"PeriodicalIF":4.1000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11288139/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2023-100963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.

Methods: Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.

Findings: Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.

Conclusion: Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.

交流儿科疾病年龄聚类中的探索性无监督机器学习分析。
背景:尽管电子医疗记录(EHR)数据的可用性越来越高,即插即用的机器学习(ML)应用编程接口也越来越广泛,但迄今为止,在医院常规工作流程中采用数据驱动决策的情况仍然有限。本研究通过按年龄推导诊断集群的视角,调查了可使用电子病历数据进行的机器学习分析类型,以及如何将结果传达给非专业的利益相关者:方法:预处理后,使用一家三级儿科医院的观察性电子病历数据,该数据包含 61 522 名独特的患者和 3315 个独特的 ICD-10 诊断代码。采用 K 均值聚类来确定患者诊断的年龄分布。通过定量指标和专家对聚类临床有效性的评估,选定了最终模型。此外,还对预处理决策的不确定性进行了分析:研究结果:确定了四个疾病年龄群,大致符合以下年龄段:0 至 1 岁;1 至 5 岁;6 至 12 岁:结果:确定了四个疾病年龄群,大致符合以下年龄段:0 至 1 岁;1 至 5 岁;5 至 13 岁;13 至 18 岁。这些群组中的诊断符合现有的关于不同年龄发病倾向的知识,而连续群组则呈现了已知的疾病进展。结果验证了文献中的类似方法。预处理决定所引起的不确定性对个体诊断的影响很大,但对群体水平的影响不大。我们成功地展示了减轻或传达这种不确定性的策略:应用于电子病历数据的无监督 ML 可以识别与临床相关的诊断年龄分布,从而增强现有的决策制定。但是,如果不适当地减轻或传达医疗数据集中的偏差,则会对结果产生极大的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
4.90%
发文量
40
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信