从胸部x光报告中提取临床信息:俄语案例研究

2020 International Conference Nonlinearity, Information and Robotics (NIR) Pub Date : 2020-12-03 DOI:10.1109/NIR50484.2020.9290235

Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov

{"title":"从胸部x光报告中提取临床信息:俄语案例研究","authors":"Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov","doi":"10.1109/NIR50484.2020.9290235","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.","PeriodicalId":274976,"journal":{"name":"2020 International Conference Nonlinearity, Information and Robotics (NIR)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Extracting clinical information from chest x-ray reports: A case study for Russian language\",\"authors\":\"Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov\",\"doi\":\"10.1109/NIR50484.2020.9290235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.\",\"PeriodicalId\":274976,\"journal\":{\"name\":\"2020 International Conference Nonlinearity, Information and Robotics (NIR)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference Nonlinearity, Information and Robotics (NIR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NIR50484.2020.9290235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference Nonlinearity, Information and Robotics (NIR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NIR50484.2020.9290235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文分析了俄语医学报告中可能的诊断识别方法。首先，我们介绍了俄罗斯原始医疗报告预处理的主要问题。其次，针对嵌入提取方法，我们分析了几种公开可用的模型，发现使用BERT模型是一种很有前途的工具。在基于嵌入提取方法的俄语医学报告分类的NLP系统的第一次尝试中，我们制定了限制现有的公开可用的俄语NLP模型在医学文本领域使用的主要弱点。由于没有可用的标记数据，我们可视化地评估每个模型，分析使用t-SNE降维检索的2D场中的嵌入表示。我们假设一个好的模型能够将描述相同诊断的报告放置在彼此靠近的位置，同时将具有不同诊断的报告移动到彼此远离的位置，从而形成聚类。最后，我们提出了几种可能的未来研究方法，我们相信这些方法将改进迄今为止在该领域取得的成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extracting clinical information from chest x-ray reports: A case study for Russian language

In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Conference Nonlinearity, Information and Robotics (NIR)

自引率

0.00%

发文量