从胸部x光报告中提取临床信息:俄语案例研究

Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov
{"title":"从胸部x光报告中提取临床信息:俄语案例研究","authors":"Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov","doi":"10.1109/NIR50484.2020.9290235","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.","PeriodicalId":274976,"journal":{"name":"2020 International Conference Nonlinearity, Information and Robotics (NIR)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Extracting clinical information from chest x-ray reports: A case study for Russian language\",\"authors\":\"Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov\",\"doi\":\"10.1109/NIR50484.2020.9290235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.\",\"PeriodicalId\":274976,\"journal\":{\"name\":\"2020 International Conference Nonlinearity, Information and Robotics (NIR)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference Nonlinearity, Information and Robotics (NIR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NIR50484.2020.9290235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference Nonlinearity, Information and Robotics (NIR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NIR50484.2020.9290235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文分析了俄语医学报告中可能的诊断识别方法。首先,我们介绍了俄罗斯原始医疗报告预处理的主要问题。其次,针对嵌入提取方法,我们分析了几种公开可用的模型,发现使用BERT模型是一种很有前途的工具。在基于嵌入提取方法的俄语医学报告分类的NLP系统的第一次尝试中,我们制定了限制现有的公开可用的俄语NLP模型在医学文本领域使用的主要弱点。由于没有可用的标记数据,我们可视化地评估每个模型,分析使用t-SNE降维检索的2D场中的嵌入表示。我们假设一个好的模型能够将描述相同诊断的报告放置在彼此靠近的位置,同时将具有不同诊断的报告移动到彼此远离的位置,从而形成聚类。最后,我们提出了几种可能的未来研究方法,我们相信这些方法将改进迄今为止在该领域取得的成果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Extracting clinical information from chest x-ray reports: A case study for Russian language
In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信