Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov
{"title":"从胸部x光报告中提取临床信息:俄语案例研究","authors":"Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov","doi":"10.1109/NIR50484.2020.9290235","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.","PeriodicalId":274976,"journal":{"name":"2020 International Conference Nonlinearity, Information and Robotics (NIR)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Extracting clinical information from chest x-ray reports: A case study for Russian language\",\"authors\":\"Evgenia Kivotova, B. Maksudov, R. Kuleev, B. Ibragimov\",\"doi\":\"10.1109/NIR50484.2020.9290235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.\",\"PeriodicalId\":274976,\"journal\":{\"name\":\"2020 International Conference Nonlinearity, Information and Robotics (NIR)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference Nonlinearity, Information and Robotics (NIR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NIR50484.2020.9290235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference Nonlinearity, Information and Robotics (NIR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NIR50484.2020.9290235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extracting clinical information from chest x-ray reports: A case study for Russian language
In this paper, we analyze possible approaches for diagnosis identification in Russian medical reports. Firstly, we introduce the main problems of raw Russian medical reports preprocessing. Secondly, focusing on the embedding extraction method, we analyzed several publicly available models and discovered that the use of BERT model is a promising instrument for this task. Performing the first attempt to build the NLP system for the Russian medical report classification based on the embeddings extraction method, we formulated the main weaknesses that limit the use of the existing publicly available Russian NLP models in the medical-text domain. Having no labeled data available, we evaluate each model visually, analyzing embeddings representation in 2D field retrieved by dimensionality reduction using t-SNE. We assume that a good model will be able to place reports that describe the same diagnosis close to each other, while moving reports with distinct diagnoses far from each other, forming clusters. Finally, we proposed several ways of possible future research that, as we believe, will improve the results achieved in this field so far.