{"title":"基于临床文本树结构的ICD代码映射模型","authors":"Jingjin Xue, Pengli Lu","doi":"10.1016/j.artmed.2025.103163","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development and progress of big data and artificial intelligence technology, the ICD coding problem of electronic medical records has been effectively solved. The deep learning method, which replaces the manual coding method, has improved the quality and efficiency of coding. However, it also faces some challenges, such as poor and fuzzy semantic representation of clinical record text and failure to consider the structural characteristics of clinical records. To address these problems, our study proposed an ICD Coding model (<strong>TR</strong>ansformer and <strong>TR</strong>ee-lstm for <strong>I</strong>CD <strong>C</strong>oding, <strong>TRIC</strong>), which enables adequate automatic ICD encoding of unstructured clinical records. In this model, the structure and features of clinical records are extracted by the constituency tree model and the transformer based model respectively, and the Tree-lstm model is used to enrich the features. Then bioBERT pre-training model is used to highlight the role of key ICD coding and improve its matching performance. Finally, it is classified by a fully connected neural network classifier to realize the many-to-many mapping between clinical records and ICD codes. On the widely used MIMIC-III full data set and sample data set, the TRIC model is compared with 12 benchmark models. The best results of 0.586, 0.109, 0.989, 0.937 and 0.758 were obtained for MiF, MaF, MiAUC, MaAUC and P@8, respectively, which verified that the TRIC model can effectively improve the quality of ICD automatic coding.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"167 ","pages":"Article 103163"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ICD code mapping model based on clinical text tree structure\",\"authors\":\"Jingjin Xue, Pengli Lu\",\"doi\":\"10.1016/j.artmed.2025.103163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid development and progress of big data and artificial intelligence technology, the ICD coding problem of electronic medical records has been effectively solved. The deep learning method, which replaces the manual coding method, has improved the quality and efficiency of coding. However, it also faces some challenges, such as poor and fuzzy semantic representation of clinical record text and failure to consider the structural characteristics of clinical records. To address these problems, our study proposed an ICD Coding model (<strong>TR</strong>ansformer and <strong>TR</strong>ee-lstm for <strong>I</strong>CD <strong>C</strong>oding, <strong>TRIC</strong>), which enables adequate automatic ICD encoding of unstructured clinical records. In this model, the structure and features of clinical records are extracted by the constituency tree model and the transformer based model respectively, and the Tree-lstm model is used to enrich the features. Then bioBERT pre-training model is used to highlight the role of key ICD coding and improve its matching performance. Finally, it is classified by a fully connected neural network classifier to realize the many-to-many mapping between clinical records and ICD codes. On the widely used MIMIC-III full data set and sample data set, the TRIC model is compared with 12 benchmark models. The best results of 0.586, 0.109, 0.989, 0.937 and 0.758 were obtained for MiF, MaF, MiAUC, MaAUC and P@8, respectively, which verified that the TRIC model can effectively improve the quality of ICD automatic coding.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"167 \",\"pages\":\"Article 103163\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365725000983\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725000983","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
随着大数据和人工智能技术的快速发展和进步,电子病历的ICD编码问题得到了有效的解决。深度学习方法取代了人工编码方法,提高了编码的质量和效率。然而,它也面临着一些挑战,如临床记录文本的语义表示不佳和模糊,以及没有考虑到临床记录的结构特征。为了解决这些问题,我们的研究提出了一个ICD编码模型(TRansformer and TRee-lstm for ICD Coding, TRIC),它可以对非结构化临床记录进行足够的自动ICD编码。该模型分别采用选区树模型和基于变压器的模型提取临床病历的结构和特征,并利用tree -lstm模型丰富特征。然后利用bioBERT预训练模型突出关键ICD编码的作用,提高其匹配性能。最后,利用全连接神经网络分类器对病历进行分类,实现病历与ICD编码之间的多对多映射。在广泛使用的MIMIC-III全数据集和样本数据集上,将TRIC模型与12个基准模型进行了比较。MiF、MaF、MiAUC、MaAUC和P@8的最佳结果分别为0.586、0.109、0.989、0.937和0.758,验证了TRIC模型可以有效提高ICD自动编码的质量。
ICD code mapping model based on clinical text tree structure
With the rapid development and progress of big data and artificial intelligence technology, the ICD coding problem of electronic medical records has been effectively solved. The deep learning method, which replaces the manual coding method, has improved the quality and efficiency of coding. However, it also faces some challenges, such as poor and fuzzy semantic representation of clinical record text and failure to consider the structural characteristics of clinical records. To address these problems, our study proposed an ICD Coding model (TRansformer and TRee-lstm for ICD Coding, TRIC), which enables adequate automatic ICD encoding of unstructured clinical records. In this model, the structure and features of clinical records are extracted by the constituency tree model and the transformer based model respectively, and the Tree-lstm model is used to enrich the features. Then bioBERT pre-training model is used to highlight the role of key ICD coding and improve its matching performance. Finally, it is classified by a fully connected neural network classifier to realize the many-to-many mapping between clinical records and ICD codes. On the widely used MIMIC-III full data set and sample data set, the TRIC model is compared with 12 benchmark models. The best results of 0.586, 0.109, 0.989, 0.937 and 0.758 were obtained for MiF, MaF, MiAUC, MaAUC and P@8, respectively, which verified that the TRIC model can effectively improve the quality of ICD automatic coding.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.