Xiaobo Li , Yijia Zhang , Xiaodi Hou , Shilong Wang , Hongfei Lin
{"title":"自动ICD编码的深度学习:回顾、机遇与挑战","authors":"Xiaobo Li , Yijia Zhang , Xiaodi Hou , Shilong Wang , Hongfei Lin","doi":"10.1016/j.artmed.2025.103187","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>The automatic International Classification of Diseases (ICD) coding task assigns unique medical codes to diseases in clinical texts for further data statistics, quality control, billing and other tasks. The efficiency and accuracy of medical code assignment is a significant challenge affecting healthcare. However, in clinical practice, Electronic Health Records (EHRs) data are usually complex, heterogeneous, non-standard and unstructured, and the manual coding process is time-consuming, laborious and error-prone. Traditional machine learning methods struggle to extract significant semantic information from clinical texts accurately, but the latest progress in Deep Learning (DL) has shown promising results to address these issues.</div></div><div><h3>Objective:</h3><div>This paper comprehensively reviewed recent advancements in utilizing deep learning for automatic ICD coding, which aimed to reveal prominent challenges and emerging development trends by summarizing and analyzing the model’s year, design motivation, deep neural networks, and auxiliary data.</div></div><div><h3>Methods:</h3><div>This review introduced systematic literature on automatic ICD coding methods based on deep learning. We screened 5 online databases, including Web of Science, SpringerLink, PubMed, ACM, and IEEE digital library, and collected 53 published articles related to deep learning-based ICD coding from 2017 to 2023.</div></div><div><h3>Results:</h3><div>These deep neural network methods aimed to overcome some challenges, such as lengthy and noisy clinical text, high dimensionality and functional relationships of medical codes, and long-tail label distribution. The Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), attention mechanisms, Transformers, Pre-trained Language Models (PLMs), etc, have become popular to address prominent issues in ICD coding. Meanwhile, introducing medical ontology within the ICD coding system (code description and code hierarchy) and external knowledge (Wikipedia articles, tabular data, Clinical Classification Software (CCS), fine-tuning PLMs based on biomedical corpus, entity recognition and concept extraction) has become an emerging trend for automatic ICD coding.</div></div><div><h3>Conclusion:</h3><div>This paper provided a comprehensive review of recent literature on applying deep learning technology to improve medical code assignment from a unique perspective. Multiple neural network methods (CNNs, RNNs, Transformers, PLMs, especially attention mechanisms) have been successfully applied in ICD tasks and achieved excellent performance. Various medical auxiliary data has also proven valuable in enhancing model feature representation and classification performance. Our in-depth and systematic analysis suggested that the automatic ICD coding method based on deep learning has a bright future in healthcare. Finally, we discussed some major challenges and outlined future development directions.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"168 ","pages":"Article 103187"},"PeriodicalIF":6.2000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning for automatic ICD coding: Review, opportunities and challenges\",\"authors\":\"Xiaobo Li , Yijia Zhang , Xiaodi Hou , Shilong Wang , Hongfei Lin\",\"doi\":\"10.1016/j.artmed.2025.103187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background:</h3><div>The automatic International Classification of Diseases (ICD) coding task assigns unique medical codes to diseases in clinical texts for further data statistics, quality control, billing and other tasks. The efficiency and accuracy of medical code assignment is a significant challenge affecting healthcare. However, in clinical practice, Electronic Health Records (EHRs) data are usually complex, heterogeneous, non-standard and unstructured, and the manual coding process is time-consuming, laborious and error-prone. Traditional machine learning methods struggle to extract significant semantic information from clinical texts accurately, but the latest progress in Deep Learning (DL) has shown promising results to address these issues.</div></div><div><h3>Objective:</h3><div>This paper comprehensively reviewed recent advancements in utilizing deep learning for automatic ICD coding, which aimed to reveal prominent challenges and emerging development trends by summarizing and analyzing the model’s year, design motivation, deep neural networks, and auxiliary data.</div></div><div><h3>Methods:</h3><div>This review introduced systematic literature on automatic ICD coding methods based on deep learning. We screened 5 online databases, including Web of Science, SpringerLink, PubMed, ACM, and IEEE digital library, and collected 53 published articles related to deep learning-based ICD coding from 2017 to 2023.</div></div><div><h3>Results:</h3><div>These deep neural network methods aimed to overcome some challenges, such as lengthy and noisy clinical text, high dimensionality and functional relationships of medical codes, and long-tail label distribution. The Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), attention mechanisms, Transformers, Pre-trained Language Models (PLMs), etc, have become popular to address prominent issues in ICD coding. Meanwhile, introducing medical ontology within the ICD coding system (code description and code hierarchy) and external knowledge (Wikipedia articles, tabular data, Clinical Classification Software (CCS), fine-tuning PLMs based on biomedical corpus, entity recognition and concept extraction) has become an emerging trend for automatic ICD coding.</div></div><div><h3>Conclusion:</h3><div>This paper provided a comprehensive review of recent literature on applying deep learning technology to improve medical code assignment from a unique perspective. Multiple neural network methods (CNNs, RNNs, Transformers, PLMs, especially attention mechanisms) have been successfully applied in ICD tasks and achieved excellent performance. Various medical auxiliary data has also proven valuable in enhancing model feature representation and classification performance. Our in-depth and systematic analysis suggested that the automatic ICD coding method based on deep learning has a bright future in healthcare. Finally, we discussed some major challenges and outlined future development directions.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"168 \",\"pages\":\"Article 103187\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365725001228\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725001228","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
背景:国际疾病自动分类(ICD)编码任务为临床文本中的疾病分配唯一的医学代码,用于进一步的数据统计、质量控制、计费等任务。医疗代码分配的效率和准确性是影响医疗保健的重大挑战。然而,在临床实践中,电子健康记录(EHRs)数据通常是复杂的、异构的、非标准的和非结构化的,并且人工编码过程耗时、费力且容易出错。传统的机器学习方法难以准确地从临床文本中提取重要的语义信息,但深度学习(DL)的最新进展显示出解决这些问题的有希望的结果。目的:通过对模型年份、设计动机、深度神经网络和辅助数据的总结和分析,全面回顾了近年来利用深度学习进行ICD自动编码的进展,揭示了当前面临的突出挑战和新兴发展趋势。方法:系统介绍了基于深度学习的ICD自动编码方法。我们筛选了Web of Science、SpringerLink、PubMed、ACM、IEEE数字图书馆等5个在线数据库,收集了2017 - 2023年间发表的53篇基于深度学习的ICD编码相关文章。结果:这些深度神经网络方法旨在克服临床文本冗长和嘈杂、医疗代码的高维数和功能关系以及长尾标签分布等挑战。卷积神经网络(cnn)、循环神经网络(rnn)、注意机制、变形器、预训练语言模型(PLMs)等,已经成为解决ICD编码中突出问题的流行方法。同时,在ICD编码系统内引入医学本体(代码描述和代码层次)和外部知识(维基百科文章、表格数据、临床分类软件(CCS)、基于生物医学语料库的微调PLMs、实体识别和概念提取)已成为ICD自动编码的新兴趋势。结论:本文从一个独特的角度全面回顾了近年来应用深度学习技术改善医疗代码分配的文献。多种神经网络方法(cnn、rnn、transformer、plm,尤其是注意力机制)已成功应用于ICD任务中,并取得了优异的性能。各种医疗辅助数据在增强模型特征表示和分类性能方面也被证明是有价值的。我们深入系统的分析表明,基于深度学习的ICD自动编码方法在医疗保健领域具有广阔的应用前景。最后,我们讨论了一些主要挑战,并概述了未来的发展方向。
Deep learning for automatic ICD coding: Review, opportunities and challenges
Background:
The automatic International Classification of Diseases (ICD) coding task assigns unique medical codes to diseases in clinical texts for further data statistics, quality control, billing and other tasks. The efficiency and accuracy of medical code assignment is a significant challenge affecting healthcare. However, in clinical practice, Electronic Health Records (EHRs) data are usually complex, heterogeneous, non-standard and unstructured, and the manual coding process is time-consuming, laborious and error-prone. Traditional machine learning methods struggle to extract significant semantic information from clinical texts accurately, but the latest progress in Deep Learning (DL) has shown promising results to address these issues.
Objective:
This paper comprehensively reviewed recent advancements in utilizing deep learning for automatic ICD coding, which aimed to reveal prominent challenges and emerging development trends by summarizing and analyzing the model’s year, design motivation, deep neural networks, and auxiliary data.
Methods:
This review introduced systematic literature on automatic ICD coding methods based on deep learning. We screened 5 online databases, including Web of Science, SpringerLink, PubMed, ACM, and IEEE digital library, and collected 53 published articles related to deep learning-based ICD coding from 2017 to 2023.
Results:
These deep neural network methods aimed to overcome some challenges, such as lengthy and noisy clinical text, high dimensionality and functional relationships of medical codes, and long-tail label distribution. The Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), attention mechanisms, Transformers, Pre-trained Language Models (PLMs), etc, have become popular to address prominent issues in ICD coding. Meanwhile, introducing medical ontology within the ICD coding system (code description and code hierarchy) and external knowledge (Wikipedia articles, tabular data, Clinical Classification Software (CCS), fine-tuning PLMs based on biomedical corpus, entity recognition and concept extraction) has become an emerging trend for automatic ICD coding.
Conclusion:
This paper provided a comprehensive review of recent literature on applying deep learning technology to improve medical code assignment from a unique perspective. Multiple neural network methods (CNNs, RNNs, Transformers, PLMs, especially attention mechanisms) have been successfully applied in ICD tasks and achieved excellent performance. Various medical auxiliary data has also proven valuable in enhancing model feature representation and classification performance. Our in-depth and systematic analysis suggested that the automatic ICD coding method based on deep learning has a bright future in healthcare. Finally, we discussed some major challenges and outlined future development directions.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.