Anastasios Lamproudis, Therese Olsen Svenning, Torbjørn Torsvik, Taridzo Chomutare, Andrius Budrionis, Phuong Dinh Ngo, Thomas Vakili, Hercules Dalianis
{"title":"Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding.","authors":"Anastasios Lamproudis, Therese Olsen Svenning, Torbjørn Torsvik, Taridzo Chomutare, Andrius Budrionis, Phuong Dinh Ngo, Thomas Vakili, Hercules Dalianis","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish. Most importantly, it can be used in a coding support setup where a tool can recommend potential codes to the coders. This reduces the range of potential codes to consider and, in turn, reduces the workload of the coder. Moreover, the de-identified and pseudonymised dataset is open to use for academic users.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"465-473"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785868/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish. Most importantly, it can be used in a coding support setup where a tool can recommend potential codes to the coders. This reduces the range of potential codes to consider and, in turn, reduces the workload of the coder. Moreover, the de-identified and pseudonymised dataset is open to use for academic users.