{"title":"STMC: Semantic Tag Medical Concept Using Word2Vec Representation","authors":"I. M. Soriano, J. Castro","doi":"10.1109/CBMS.2018.00075","DOIUrl":null,"url":null,"abstract":"In this paper we propose a recognition system of medical concepts from free text clinical reports. Our approach tries to recognize also concepts which are named with local terminology, with medical writing scripts, short words, abbreviations and even spelling mistakes. We consider a clinical terminology ontology (Snomed-CT), as a dictionary of concepts. In a first step we obtain an embedding model using word2vec methodology from a big corpus database of clinical reports. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space, and so the geometrical similarity can be considered a measure of semantic relation. We have considered 615513 emergency clinical reports from the Hospital \"Rafael Mendez\" in Lorca, Murcia. In these reports there are a lot of local language of the emergency domain, medical writing scripts, short words, abbreviations and even spelling mistakes. With the model obtained we represent the words and sentences as vectors, and by applying cosine similarity we identify which concepts of the ontology are named in the text. Finally, we represent the clinical reports (EHR) like a bag of concepts, and use this representation to search similar documents. The paper illustrates 1) how we build the word2vec model from the free text clinical reports, 2) How we extend the embedding from words to sentences, and 3) how we use the cosine similarity to identify concepts. The experimentation, and expert human validation, shows that: a) the concepts named in the text with the ontology terminology are well recognized, and b) others concepts that are not named with the ontology terminology are also recognized, obtaining a high precision and recall measures.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"32 1","pages":"393-398"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2018.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper we propose a recognition system of medical concepts from free text clinical reports. Our approach tries to recognize also concepts which are named with local terminology, with medical writing scripts, short words, abbreviations and even spelling mistakes. We consider a clinical terminology ontology (Snomed-CT), as a dictionary of concepts. In a first step we obtain an embedding model using word2vec methodology from a big corpus database of clinical reports. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space, and so the geometrical similarity can be considered a measure of semantic relation. We have considered 615513 emergency clinical reports from the Hospital "Rafael Mendez" in Lorca, Murcia. In these reports there are a lot of local language of the emergency domain, medical writing scripts, short words, abbreviations and even spelling mistakes. With the model obtained we represent the words and sentences as vectors, and by applying cosine similarity we identify which concepts of the ontology are named in the text. Finally, we represent the clinical reports (EHR) like a bag of concepts, and use this representation to search similar documents. The paper illustrates 1) how we build the word2vec model from the free text clinical reports, 2) How we extend the embedding from words to sentences, and 3) how we use the cosine similarity to identify concepts. The experimentation, and expert human validation, shows that: a) the concepts named in the text with the ontology terminology are well recognized, and b) others concepts that are not named with the ontology terminology are also recognized, obtaining a high precision and recall measures.