Machine learning for automated cause-of-death classification from 2021 to 2022 in Korea: development and validation of an ICD-10 prediction model.

IF 0.2 Q3 MEDICINE, GENERAL & INTERNAL
Ewha Medical Journal Pub Date : 2025-07-01 Epub Date: 2025-07-28 DOI:10.12771/emj.2025.00675
Seokmin Lee, Gyeongmin Im
{"title":"Machine learning for automated cause-of-death classification from 2021 to 2022 in Korea: development and validation of an ICD-10 prediction model.","authors":"Seokmin Lee, Gyeongmin Im","doi":"10.12771/emj.2025.00675","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study evaluated the feasibility and performance of a deep learning approach utilizing the Korean Medical BERT (KM-BERT) model for the automated classification of underlying causes of death within national mortality statistics. It aimed to assess predictive accuracy throughout the cause-of-death coding workflow and to identify limitations and opportunities for further artificial intelligence (AI) integration.</p><p><strong>Methods: </strong>We performed a retrospective prediction study using 693,587 death certificates issued in Korea between January 2021 and December 2022. Free-text fields for immediate, antecedent, and contributory causes were concatenated and fine-tuned with KM-BERT. Three classification models were developed: (1) final underlying cause prediction (International Classification of Diseases, 10th Revision [ICD-10] code) from certificate inputs, (2) tentative underlying cause selection based on ICD-10 Volume 2 rules, and (3) classification of individual cause-of-death entries. Models were trained and validated using 2021 data (80% training, 20% validation) and evaluated on 2022 data. Performance metrics included overall accuracy, weighted F1 score, and macro F1 score.</p><p><strong>Results: </strong>On 306,898 certificates from 2022, the final cause model achieved 62.65% accuracy (F1-weighted, 0.5940; F1-macro, 0.1503). The tentative cause model demonstrated 95.35% accuracy (F1-weighted, 0.9516; F1-macro, 0.4996). The individual entry model yielded 79.51% accuracy (F1-weighted, 0.7741; F1-macro, 0.9250). Error analysis indicated reduced reliability for rare diseases and for specific ICD chapters, which require supplementary administrative data.</p><p><strong>Conclusion: </strong>Despite strong performance in mapping free-text inputs and selecting tentative underlying causes, there remains a need for improved data quality, administrative record integration, and model refinement. A systematic, long-term approach is essential for the broad adoption of AI in mortality statistics.</p>","PeriodicalId":41392,"journal":{"name":"Ewha Medical Journal","volume":"48 3","pages":"e45"},"PeriodicalIF":0.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362283/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ewha Medical Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12771/emj.2025.00675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/28 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: This study evaluated the feasibility and performance of a deep learning approach utilizing the Korean Medical BERT (KM-BERT) model for the automated classification of underlying causes of death within national mortality statistics. It aimed to assess predictive accuracy throughout the cause-of-death coding workflow and to identify limitations and opportunities for further artificial intelligence (AI) integration.

Methods: We performed a retrospective prediction study using 693,587 death certificates issued in Korea between January 2021 and December 2022. Free-text fields for immediate, antecedent, and contributory causes were concatenated and fine-tuned with KM-BERT. Three classification models were developed: (1) final underlying cause prediction (International Classification of Diseases, 10th Revision [ICD-10] code) from certificate inputs, (2) tentative underlying cause selection based on ICD-10 Volume 2 rules, and (3) classification of individual cause-of-death entries. Models were trained and validated using 2021 data (80% training, 20% validation) and evaluated on 2022 data. Performance metrics included overall accuracy, weighted F1 score, and macro F1 score.

Results: On 306,898 certificates from 2022, the final cause model achieved 62.65% accuracy (F1-weighted, 0.5940; F1-macro, 0.1503). The tentative cause model demonstrated 95.35% accuracy (F1-weighted, 0.9516; F1-macro, 0.4996). The individual entry model yielded 79.51% accuracy (F1-weighted, 0.7741; F1-macro, 0.9250). Error analysis indicated reduced reliability for rare diseases and for specific ICD chapters, which require supplementary administrative data.

Conclusion: Despite strong performance in mapping free-text inputs and selecting tentative underlying causes, there remains a need for improved data quality, administrative record integration, and model refinement. A systematic, long-term approach is essential for the broad adoption of AI in mortality statistics.

Abstract Image

Abstract Image

Abstract Image

韩国2021 - 2022年自动死因分类的机器学习:ICD-10预测模型的开发和验证
目的:本研究评估了利用韩国医学BERT (KM-BERT)模型对国家死亡率统计中潜在死亡原因进行自动分类的深度学习方法的可行性和性能。它旨在评估整个死因编码工作流程的预测准确性,并确定进一步集成人工智能的限制和机会。方法:我们对韩国在2021年1月至2022年12月期间签发的693587份死亡证明进行了回顾性预测研究。直接原因、先行原因和辅助原因的自由文本字段用KM-BERT进行了连接和微调。开发了三种分类模型:(1)根据证书输入的最终潜在原因预测(国际疾病分类,第十次修订[ICD-10]代码),(2)基于ICD-10第2卷规则的暂定潜在原因选择,以及(3)个人死因条目的分类。使用2021个数据对模型进行训练和验证(80%训练,20%验证),并对2022个数据进行评估。性能指标包括总体准确性、加权F1分数和宏观F1分数。结果:对2022年以来的306,898份证书,最终原因模型的准确率达到62.65% (f1加权,0.5940;F1-macro, 0.1503)。初步原因模型的准确率为95.35% (f1加权,0.9516;F1-macro, 0.4996)。个体进入模型的准确率为79.51% (f1加权,0.7741;F1-macro, 0.9250)。错误分析表明,罕见病和特定ICD章节的可靠性降低,这需要补充管理数据。结论:尽管在映射自由文本输入和选择暂定的潜在原因方面表现出色,但仍需要改进数据质量、管理记录集成和模型优化。要在死亡率统计中广泛采用人工智能,系统的长期方法至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ewha Medical Journal
Ewha Medical Journal MEDICINE, GENERAL & INTERNAL-
自引率
33.30%
发文量
28
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信