基于临床文本树结构的ICD代码映射模型

IF 6.2 2区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jingjin Xue, Pengli Lu
{"title":"基于临床文本树结构的ICD代码映射模型","authors":"Jingjin Xue,&nbsp;Pengli Lu","doi":"10.1016/j.artmed.2025.103163","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development and progress of big data and artificial intelligence technology, the ICD coding problem of electronic medical records has been effectively solved. The deep learning method, which replaces the manual coding method, has improved the quality and efficiency of coding. However, it also faces some challenges, such as poor and fuzzy semantic representation of clinical record text and failure to consider the structural characteristics of clinical records. To address these problems, our study proposed an ICD Coding model (<strong>TR</strong>ansformer and <strong>TR</strong>ee-lstm for <strong>I</strong>CD <strong>C</strong>oding, <strong>TRIC</strong>), which enables adequate automatic ICD encoding of unstructured clinical records. In this model, the structure and features of clinical records are extracted by the constituency tree model and the transformer based model respectively, and the Tree-lstm model is used to enrich the features. Then bioBERT pre-training model is used to highlight the role of key ICD coding and improve its matching performance. Finally, it is classified by a fully connected neural network classifier to realize the many-to-many mapping between clinical records and ICD codes. On the widely used MIMIC-III full data set and sample data set, the TRIC model is compared with 12 benchmark models. The best results of 0.586, 0.109, 0.989, 0.937 and 0.758 were obtained for MiF, MaF, MiAUC, MaAUC and P@8, respectively, which verified that the TRIC model can effectively improve the quality of ICD automatic coding.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"167 ","pages":"Article 103163"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ICD code mapping model based on clinical text tree structure\",\"authors\":\"Jingjin Xue,&nbsp;Pengli Lu\",\"doi\":\"10.1016/j.artmed.2025.103163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid development and progress of big data and artificial intelligence technology, the ICD coding problem of electronic medical records has been effectively solved. The deep learning method, which replaces the manual coding method, has improved the quality and efficiency of coding. However, it also faces some challenges, such as poor and fuzzy semantic representation of clinical record text and failure to consider the structural characteristics of clinical records. To address these problems, our study proposed an ICD Coding model (<strong>TR</strong>ansformer and <strong>TR</strong>ee-lstm for <strong>I</strong>CD <strong>C</strong>oding, <strong>TRIC</strong>), which enables adequate automatic ICD encoding of unstructured clinical records. In this model, the structure and features of clinical records are extracted by the constituency tree model and the transformer based model respectively, and the Tree-lstm model is used to enrich the features. Then bioBERT pre-training model is used to highlight the role of key ICD coding and improve its matching performance. Finally, it is classified by a fully connected neural network classifier to realize the many-to-many mapping between clinical records and ICD codes. On the widely used MIMIC-III full data set and sample data set, the TRIC model is compared with 12 benchmark models. The best results of 0.586, 0.109, 0.989, 0.937 and 0.758 were obtained for MiF, MaF, MiAUC, MaAUC and P@8, respectively, which verified that the TRIC model can effectively improve the quality of ICD automatic coding.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"167 \",\"pages\":\"Article 103163\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365725000983\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725000983","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

随着大数据和人工智能技术的快速发展和进步,电子病历的ICD编码问题得到了有效的解决。深度学习方法取代了人工编码方法,提高了编码的质量和效率。然而,它也面临着一些挑战,如临床记录文本的语义表示不佳和模糊,以及没有考虑到临床记录的结构特征。为了解决这些问题,我们的研究提出了一个ICD编码模型(TRansformer and TRee-lstm for ICD Coding, TRIC),它可以对非结构化临床记录进行足够的自动ICD编码。该模型分别采用选区树模型和基于变压器的模型提取临床病历的结构和特征,并利用tree -lstm模型丰富特征。然后利用bioBERT预训练模型突出关键ICD编码的作用,提高其匹配性能。最后,利用全连接神经网络分类器对病历进行分类,实现病历与ICD编码之间的多对多映射。在广泛使用的MIMIC-III全数据集和样本数据集上,将TRIC模型与12个基准模型进行了比较。MiF、MaF、MiAUC、MaAUC和P@8的最佳结果分别为0.586、0.109、0.989、0.937和0.758,验证了TRIC模型可以有效提高ICD自动编码的质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ICD code mapping model based on clinical text tree structure
With the rapid development and progress of big data and artificial intelligence technology, the ICD coding problem of electronic medical records has been effectively solved. The deep learning method, which replaces the manual coding method, has improved the quality and efficiency of coding. However, it also faces some challenges, such as poor and fuzzy semantic representation of clinical record text and failure to consider the structural characteristics of clinical records. To address these problems, our study proposed an ICD Coding model (TRansformer and TRee-lstm for ICD Coding, TRIC), which enables adequate automatic ICD encoding of unstructured clinical records. In this model, the structure and features of clinical records are extracted by the constituency tree model and the transformer based model respectively, and the Tree-lstm model is used to enrich the features. Then bioBERT pre-training model is used to highlight the role of key ICD coding and improve its matching performance. Finally, it is classified by a fully connected neural network classifier to realize the many-to-many mapping between clinical records and ICD codes. On the widely used MIMIC-III full data set and sample data set, the TRIC model is compared with 12 benchmark models. The best results of 0.586, 0.109, 0.989, 0.937 and 0.758 were obtained for MiF, MaF, MiAUC, MaAUC and P@8, respectively, which verified that the TRIC model can effectively improve the quality of ICD automatic coding.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine 工程技术-工程:生物医学
CiteScore
15.00
自引率
2.70%
发文量
143
审稿时长
6.3 months
期刊介绍: Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信