STMC: Semantic Tag Medical Concept Using Word2Vec Representation

I. M. Soriano, J. Castro
{"title":"STMC: Semantic Tag Medical Concept Using Word2Vec Representation","authors":"I. M. Soriano, J. Castro","doi":"10.1109/CBMS.2018.00075","DOIUrl":null,"url":null,"abstract":"In this paper we propose a recognition system of medical concepts from free text clinical reports. Our approach tries to recognize also concepts which are named with local terminology, with medical writing scripts, short words, abbreviations and even spelling mistakes. We consider a clinical terminology ontology (Snomed-CT), as a dictionary of concepts. In a first step we obtain an embedding model using word2vec methodology from a big corpus database of clinical reports. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space, and so the geometrical similarity can be considered a measure of semantic relation. We have considered 615513 emergency clinical reports from the Hospital \"Rafael Mendez\" in Lorca, Murcia. In these reports there are a lot of local language of the emergency domain, medical writing scripts, short words, abbreviations and even spelling mistakes. With the model obtained we represent the words and sentences as vectors, and by applying cosine similarity we identify which concepts of the ontology are named in the text. Finally, we represent the clinical reports (EHR) like a bag of concepts, and use this representation to search similar documents. The paper illustrates 1) how we build the word2vec model from the free text clinical reports, 2) How we extend the embedding from words to sentences, and 3) how we use the cosine similarity to identify concepts. The experimentation, and expert human validation, shows that: a) the concepts named in the text with the ontology terminology are well recognized, and b) others concepts that are not named with the ontology terminology are also recognized, obtaining a high precision and recall measures.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"32 1","pages":"393-398"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2018.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper we propose a recognition system of medical concepts from free text clinical reports. Our approach tries to recognize also concepts which are named with local terminology, with medical writing scripts, short words, abbreviations and even spelling mistakes. We consider a clinical terminology ontology (Snomed-CT), as a dictionary of concepts. In a first step we obtain an embedding model using word2vec methodology from a big corpus database of clinical reports. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space, and so the geometrical similarity can be considered a measure of semantic relation. We have considered 615513 emergency clinical reports from the Hospital "Rafael Mendez" in Lorca, Murcia. In these reports there are a lot of local language of the emergency domain, medical writing scripts, short words, abbreviations and even spelling mistakes. With the model obtained we represent the words and sentences as vectors, and by applying cosine similarity we identify which concepts of the ontology are named in the text. Finally, we represent the clinical reports (EHR) like a bag of concepts, and use this representation to search similar documents. The paper illustrates 1) how we build the word2vec model from the free text clinical reports, 2) How we extend the embedding from words to sentences, and 3) how we use the cosine similarity to identify concepts. The experimentation, and expert human validation, shows that: a) the concepts named in the text with the ontology terminology are well recognized, and b) others concepts that are not named with the ontology terminology are also recognized, obtaining a high precision and recall measures.
使用Word2Vec表示的语义标签医学概念
本文提出了一种基于自由文本临床报告的医学概念识别系统。我们的方法也试图识别用当地术语命名的概念,医学写作脚本,短句,缩写,甚至拼写错误。我们考虑一个临床术语本体(Snomed-CT),作为一个概念词典。在第一步中,我们使用word2vec方法从大型临床报告语料库数据库中获得嵌入模型。词向量被定位在向量空间中,使得语料库中具有共同上下文的词在空间中彼此接近,因此几何相似性可以被认为是语义关系的度量。我们审议了穆尔西亚洛尔卡"拉斐尔·门德斯"医院的615513份紧急临床报告。在这些报告中有大量的应急领域的当地语言,医学写作脚本,短词,缩写,甚至拼写错误。利用得到的模型,我们将单词和句子表示为向量,并通过余弦相似度来识别本体的哪些概念在文本中被命名。最后,我们将临床报告(EHR)表示为概念包,并使用这种表示来搜索类似的文档。本文阐述了1)如何从自由文本临床报告中构建word2vec模型,2)如何将嵌入从单词扩展到句子,以及3)如何使用余弦相似度来识别概念。实验和专家人工验证表明:a)文本中使用本体术语命名的概念被很好地识别,b)其他未使用本体术语命名的概念也被识别,获得了较高的准确率和召回率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信