“Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models”

IF 1.8 Q3 PHARMACOLOGY & PHARMACY
Mohsen Askar , Lars Småbrekke , Einar Holsbø , Lars Ailo Bongo , Kristian Svendsen
{"title":"“Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models”","authors":"Mohsen Askar ,&nbsp;Lars Småbrekke ,&nbsp;Einar Holsbø ,&nbsp;Lars Ailo Bongo ,&nbsp;Kristian Svendsen","doi":"10.1016/j.rcsop.2024.100463","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.</p></div><div><h3>Objectives</h3><p>To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.</p></div><div><h3>Methods</h3><p>The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.</p><p>The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.</p></div><div><h3>Results</h3><p>Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.</p></div><div><h3>Conclusions</h3><p>Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.</p></div>","PeriodicalId":73003,"journal":{"name":"Exploratory research in clinical and social pharmacy","volume":"14 ","pages":"Article 100463"},"PeriodicalIF":1.8000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266727662400060X/pdfft?md5=97bf02f99058457c9ad310ec9e29b460&pid=1-s2.0-S266727662400060X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Exploratory research in clinical and social pharmacy","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266727662400060X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Machine learning (ML) prediction models in healthcare and pharmacy-related research face challenges with encoding high-dimensional Healthcare Coding Systems (HCSs) such as ICD, ATC, and DRG codes, given the trade-off between reducing model dimensionality and minimizing information loss.

Objectives

To investigate using Network Analysis modularity as a method to group HCSs to improve encoding in ML models.

Methods

The MIMIC-III dataset was utilized to create a multimorbidity network in which ICD-9 codes are the nodes and the edges are the number of patients sharing the same ICD-9 code pairs. A modularity detection algorithm was applied using different resolution thresholds to generate 6 sets of modules. The impact of four grouping strategies on the performance of predicting 90-day Intensive Care Unit readmissions was assessed. The grouping strategies compared: 1) binary encoding of codes, 2) encoding codes grouped by network modules, 3) grouping codes to the highest level of ICD-9 hierarchy, and 4) grouping using the single-level Clinical Classification Software (CCS). The same methodology was also applied to encode DRG codes but limiting the comparison to a single modularity threshold to binary encoding.

The performance was assessed using Logistic Regression, Support Vector Machine with a non-linear kernel, and Gradient Boosting Machines algorithms. Accuracy, Precision, Recall, AUC, and F1-score with 95% confidence intervals were reported.

Results

Models utilized modularity encoding outperformed ungrouped codes binary encoding models. The accuracy improved across all algorithms ranging from 0.736 to 0.78 for the modularity encoding, to 0.727 to 0.779 for binary encoding. AUC, recall, and precision also improved across almost all algorithms. In comparison with other grouping approaches, modularity encoding generally showed slightly higher performance in AUC, ranging from 0.813 to 0.837, and precision, ranging from 0.752 to 0.782.

Conclusions

Modularity encoding enhances the performance of ML models in pharmacy research by effectively reducing dimensionality and retaining necessary information. Across the three algorithms used, models utilizing modularity encoding showed superior or comparable performance to other encoding approaches. Modularity encoding introduces other advantages such as it can be used for both hierarchical and non-hierarchical HCSs, the approach is clinically relevant, and can enhance ML models' clinical interpretation. A Python package has been developed to facilitate the use of the approach for future research.

"利用网络分析模块化对健康代码系统进行分组,降低机器学习模型的维度"
背景医疗保健和药学相关研究中的机器学习(ML)预测模型在编码高维医疗保健编码系统(HCS)(如 ICD、ATC 和 DRG 编码)时面临挑战,因为需要在降低模型维度和减少信息丢失之间权衡利弊。方法利用 MIMIC-III 数据集创建多病网络,其中 ICD-9 代码为节点,边为共享相同 ICD-9 代码对的患者人数。模块化检测算法使用不同的分辨率阈值生成 6 组模块。评估了四种分组策略对预测重症监护病房 90 天再入院率的影响。比较的分组策略包括1)对代码进行二进制编码;2)按网络模块对代码进行分组;3)按 ICD-9 层次结构的最高级别对代码进行分组;4)使用单级临床分类软件 (CCS) 进行分组。同样的方法也适用于 DRG 代码的编码,但比较仅限于二进制编码的单一模块化阈值。使用逻辑回归、非线性内核支持向量机和梯度提升机算法对性能进行了评估。结果采用模块化编码的模型优于未分组编码的二进制编码模型。所有算法的准确率都有所提高,模块化编码的准确率从 0.736 提高到 0.78,二进制编码的准确率从 0.727 提高到 0.779。几乎所有算法的 AUC、召回率和精确度也都有所提高。与其他分组方法相比,模块化编码的 AUC(从 0.813 到 0.837)和精度(从 0.752 到 0.782)通常略高。在所使用的三种算法中,使用模块化编码的模型显示出优于或类似于其他编码方法的性能。模块化编码还具有其他优势,例如它既可用于分层 HCS,也可用于非分层 HCS,该方法与临床相关,并能增强 ML 模型的临床解释能力。为了便于在未来的研究中使用这种方法,我们开发了一个 Python 软件包。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
审稿时长
103 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信