自动ICD-10代码与诊断的关联:保加利亚病例

Boris Velichkov, Simeon Gerginov, P. Panayotov, S. Vassileva, Gerasim Velchev, I. Koychev, S. Boytcheva
{"title":"自动ICD-10代码与诊断的关联:保加利亚病例","authors":"Boris Velichkov, Simeon Gerginov, P. Panayotov, S. Vassileva, Gerasim Velchev, I. Koychev, S. Boytcheva","doi":"10.1145/3429210.3429224","DOIUrl":null,"url":null,"abstract":"This paper presents an approach for the automatic association of diagnoses in Bulgarian language to ICD-10 codes. Since this task is currently performed manually by medical professionals, the ability to automate it would save time and allow doctors to focus more on patient care. The presented approach employs a fine-tuned language model (i.e. BERT) as a multi-class classification model. As there are several different types of BERT models, we conduct experiments to assess the applicability of domain and language specific model adaptation. To train our models we use a big corpora of about 350,000 textual descriptions of diagnosis in Bulgarian language annotated with ICD-10 codes. We conduct experiments comparing the accuracy of ICD-10 code prediction using different types of BERT language models. The results show that the MultilingualBERT model (Accuracy Top 1 - 81%; Macro F1 - 86%, MRR Top 5 - 88%) outperforms other models. However, all models seem to suffer from the class imbalance in the training dataset. The achieved accuracy of prediction in the experiments can be evaluated as very high, given the huge amount of classes and noisiness of the data. The result also provides evidence that the collected dataset and the proposed approach can be useful in building an application to help medical practitioners with this task and encourages further research to improve the prediction accuracy of the models. By design, the proposed approach strives to be language-independent as much as possible and can be easily adapted to other languages.","PeriodicalId":164790,"journal":{"name":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Automatic ICD-10 codes association to diagnosis: Bulgarian case\",\"authors\":\"Boris Velichkov, Simeon Gerginov, P. Panayotov, S. Vassileva, Gerasim Velchev, I. Koychev, S. Boytcheva\",\"doi\":\"10.1145/3429210.3429224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an approach for the automatic association of diagnoses in Bulgarian language to ICD-10 codes. Since this task is currently performed manually by medical professionals, the ability to automate it would save time and allow doctors to focus more on patient care. The presented approach employs a fine-tuned language model (i.e. BERT) as a multi-class classification model. As there are several different types of BERT models, we conduct experiments to assess the applicability of domain and language specific model adaptation. To train our models we use a big corpora of about 350,000 textual descriptions of diagnosis in Bulgarian language annotated with ICD-10 codes. We conduct experiments comparing the accuracy of ICD-10 code prediction using different types of BERT language models. The results show that the MultilingualBERT model (Accuracy Top 1 - 81%; Macro F1 - 86%, MRR Top 5 - 88%) outperforms other models. However, all models seem to suffer from the class imbalance in the training dataset. The achieved accuracy of prediction in the experiments can be evaluated as very high, given the huge amount of classes and noisiness of the data. The result also provides evidence that the collected dataset and the proposed approach can be useful in building an application to help medical practitioners with this task and encourages further research to improve the prediction accuracy of the models. By design, the proposed approach strives to be language-independent as much as possible and can be easily adapted to other languages.\",\"PeriodicalId\":164790,\"journal\":{\"name\":\"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3429210.3429224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CSBio '20: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3429210.3429224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

本文提出了一种保加利亚语诊断与ICD-10代码自动关联的方法。由于这项任务目前由医疗专业人员手动执行,因此自动化功能将节省时间,并使医生能够更多地关注患者护理。所提出的方法采用一种微调的语言模型(即BERT)作为多类分类模型。由于有几种不同类型的BERT模型,我们进行了实验来评估领域和语言特定模型自适应的适用性。为了训练我们的模型,我们使用了一个大型语料库,该语料库包含大约350,000个保加利亚语的诊断文本描述,并附有ICD-10代码注释。我们通过实验比较了不同类型的BERT语言模型对ICD-10代码预测的准确性。结果表明:MultilingualBERT模型(准确率Top 1 - 81%;宏观F1 - 86%, MRR前5 - 88%)优于其他模型。然而,所有的模型似乎都受到训练数据集中的类不平衡的影响。考虑到大量的分类和数据的噪声,在实验中实现的预测精度可以评价为非常高。该结果还提供了证据,表明所收集的数据集和提出的方法可以用于构建应用程序,以帮助医疗从业者完成这项任务,并鼓励进一步研究以提高模型的预测准确性。通过设计,所提出的方法力求尽可能地与语言无关,并且可以很容易地适应其他语言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic ICD-10 codes association to diagnosis: Bulgarian case
This paper presents an approach for the automatic association of diagnoses in Bulgarian language to ICD-10 codes. Since this task is currently performed manually by medical professionals, the ability to automate it would save time and allow doctors to focus more on patient care. The presented approach employs a fine-tuned language model (i.e. BERT) as a multi-class classification model. As there are several different types of BERT models, we conduct experiments to assess the applicability of domain and language specific model adaptation. To train our models we use a big corpora of about 350,000 textual descriptions of diagnosis in Bulgarian language annotated with ICD-10 codes. We conduct experiments comparing the accuracy of ICD-10 code prediction using different types of BERT language models. The results show that the MultilingualBERT model (Accuracy Top 1 - 81%; Macro F1 - 86%, MRR Top 5 - 88%) outperforms other models. However, all models seem to suffer from the class imbalance in the training dataset. The achieved accuracy of prediction in the experiments can be evaluated as very high, given the huge amount of classes and noisiness of the data. The result also provides evidence that the collected dataset and the proposed approach can be useful in building an application to help medical practitioners with this task and encourages further research to improve the prediction accuracy of the models. By design, the proposed approach strives to be language-independent as much as possible and can be easily adapted to other languages.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信