Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding.

AMIA ... Annual Symposium proceedings. AMIA Symposium Pub Date : 2024-01-11 eCollection Date: 2023-01-01
Anastasios Lamproudis, Therese Olsen Svenning, Torbjørn Torsvik, Taridzo Chomutare, Andrius Budrionis, Phuong Dinh Ngo, Thomas Vakili, Hercules Dalianis
{"title":"Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding.","authors":"Anastasios Lamproudis, Therese Olsen Svenning, Torbjørn Torsvik, Taridzo Chomutare, Andrius Budrionis, Phuong Dinh Ngo, Thomas Vakili, Hercules Dalianis","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish. Most importantly, it can be used in a coding support setup where a tool can recommend potential codes to the coders. This reduces the range of potential codes to consider and, in turn, reduces the workload of the coder. Moreover, the de-identified and pseudonymised dataset is open to use for academic users.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"465-473"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785868/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish. Most importantly, it can be used in a coding support setup where a tool can recommend potential codes to the coders. This reduces the range of potential codes to consider and, in turn, reduces the workload of the coder. Moreover, the de-identified and pseudonymised dataset is open to use for academic users.

使用大型开放式临床语料库改进 ICD-10 诊断编码。
随着自然语言处理和深度学习领域的最新进展,开发可协助医疗编码员进行 ICD-10 诊断编码并提高其出院摘要编码效率的工具比以前更加可行。为此,开发这些模型的一个重要组成部分就是用于训练这些模型的数据集。本研究介绍了此类数据集,结果表明,其中一个数据集可用于开发基于 BERT 的语言模型,该模型在为用瑞典语撰写的出院摘要分配 ICD-10 代码方面一直表现出色。最重要的是,该模型可用于编码支持设置,其中一个工具可向编码员推荐潜在的编码。这就减少了需要考虑的潜在编码范围,从而减轻了编码员的工作量。此外,去标识化和假名化的数据集可供学术用户使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信