用Word2Vec表示SNOMED CT术语

Proceedings. IEEE International Symposium on Computer-Based Medical Systems Pub Date : 2019-06-01 DOI:10.1109/CBMS.2019.00138

I. M. Soriano, J. Castro, J. Fernández-breis, I. S. Román, A. A. Barriuso, David Guevara Baraza

{"title":"用Word2Vec表示SNOMED CT术语","authors":"I. M. Soriano, J. Castro, J. Fernández-breis, I. S. Román, A. A. Barriuso, David Guevara Baraza","doi":"10.1109/CBMS.2019.00138","DOIUrl":null,"url":null,"abstract":"Hospital Information Systems (H.I.S) use Electronic Health Record to store heterogeneous data from the patients. One important goal in this kind of systems is that the information must be, normalized and codify with a clinical terminology to represent exactly the healthcare meaning. Usually this process need human experts to identify and map the correct concept, this is a slow and tedious task. One of the most widespread clinical terminologies with more projection is Snomed-CT. This is an ontology multilingual clinical terminology that represent the clinical concepts with a unique code. We introduce in this paper Snomed2Vec, new approach of semantic search tool to find the most similar concepts using Snomed-CT. This is an ontology based named entity recognition system using word embedding, that suggest what is the most similar concept, that appear in a text. To evaluate the tool we suggest two kind of validations, one against a corpus gold with diagnostic from clinical reports, and a social validation, with a public free web access. We publish an access web to the academic world to use, test and validate the tool. The results of validation shows that this process help to the specialist to the election of choose the correct concepts from Snomed-CT. The paper illustrates 1) how create the initial big corpus of texts, to train the word2vec models, 2) how we use this vector space model to create our final Snomed2Vec vector space model, 3) The use of the cosine similarity distance, to obtain the most similar concepts, grouping by the hierarchies from Snomed-CT. We publish to the academic world: https://github.com/NachusS/Snomed2Vec access to the public web tool, and the notebook, for develop and test this paper.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"27 1","pages":"678-683"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Snomed2Vec: Representation of SNOMED CT Terms with Word2Vec\",\"authors\":\"I. M. Soriano, J. Castro, J. Fernández-breis, I. S. Román, A. A. Barriuso, David Guevara Baraza\",\"doi\":\"10.1109/CBMS.2019.00138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hospital Information Systems (H.I.S) use Electronic Health Record to store heterogeneous data from the patients. One important goal in this kind of systems is that the information must be, normalized and codify with a clinical terminology to represent exactly the healthcare meaning. Usually this process need human experts to identify and map the correct concept, this is a slow and tedious task. One of the most widespread clinical terminologies with more projection is Snomed-CT. This is an ontology multilingual clinical terminology that represent the clinical concepts with a unique code. We introduce in this paper Snomed2Vec, new approach of semantic search tool to find the most similar concepts using Snomed-CT. This is an ontology based named entity recognition system using word embedding, that suggest what is the most similar concept, that appear in a text. To evaluate the tool we suggest two kind of validations, one against a corpus gold with diagnostic from clinical reports, and a social validation, with a public free web access. We publish an access web to the academic world to use, test and validate the tool. The results of validation shows that this process help to the specialist to the election of choose the correct concepts from Snomed-CT. The paper illustrates 1) how create the initial big corpus of texts, to train the word2vec models, 2) how we use this vector space model to create our final Snomed2Vec vector space model, 3) The use of the cosine similarity distance, to obtain the most similar concepts, grouping by the hierarchies from Snomed-CT. We publish to the academic world: https://github.com/NachusS/Snomed2Vec access to the public web tool, and the notebook, for develop and test this paper.\",\"PeriodicalId\":74567,\"journal\":{\"name\":\"Proceedings. IEEE International Symposium on Computer-Based Medical Systems\",\"volume\":\"27 1\",\"pages\":\"678-683\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Symposium on Computer-Based Medical Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CBMS.2019.00138\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2019.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

医院信息系统(H.I.S)使用电子健康记录来存储来自患者的异构数据。这类系统的一个重要目标是，信息必须被规范化，并用临床术语编纂，以准确地表示医疗保健意义。通常这个过程需要人类专家来识别和绘制正确的概念，这是一个缓慢而繁琐的任务。其中一个最广泛的临床术语与更多的投影是Snomed-CT。这是一个多语言临床术语本体，用唯一的代码表示临床概念。本文介绍了一种新的语义搜索工具snomed - 2vec，它利用Snomed-CT来查找最相似的概念。这是一个基于本体的命名实体识别系统，它使用词嵌入来提示文本中出现的最相似的概念。为了评估该工具，我们建议进行两种验证，一种是针对临床报告诊断的语料库金，另一种是针对公共免费网络访问的社会验证。我们发布了一个访问网站，供学术界使用、测试和验证该工具。验证结果表明，该过程有助于专家从Snomed-CT中选择正确的概念。本文阐述了1)如何创建初始的大文本语料库，以训练word2vec模型;2)如何使用该向量空间模型来创建最终的snoomed2vec向量空间模型;3)使用余弦相似距离，从snoomed2vec中获得最相似的概念，按层次进行分组。我们向学术界发布:https://github.com/NachusS/Snomed2Vec访问公共网络工具，以及笔记本，用于开发和测试本文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Snomed2Vec: Representation of SNOMED CT Terms with Word2Vec

Hospital Information Systems (H.I.S) use Electronic Health Record to store heterogeneous data from the patients. One important goal in this kind of systems is that the information must be, normalized and codify with a clinical terminology to represent exactly the healthcare meaning. Usually this process need human experts to identify and map the correct concept, this is a slow and tedious task. One of the most widespread clinical terminologies with more projection is Snomed-CT. This is an ontology multilingual clinical terminology that represent the clinical concepts with a unique code. We introduce in this paper Snomed2Vec, new approach of semantic search tool to find the most similar concepts using Snomed-CT. This is an ontology based named entity recognition system using word embedding, that suggest what is the most similar concept, that appear in a text. To evaluate the tool we suggest two kind of validations, one against a corpus gold with diagnostic from clinical reports, and a social validation, with a public free web access. We publish an access web to the academic world to use, test and validate the tool. The results of validation shows that this process help to the specialist to the election of choose the correct concepts from Snomed-CT. The paper illustrates 1) how create the initial big corpus of texts, to train the word2vec models, 2) how we use this vector space model to create our final Snomed2Vec vector space model, 3) The use of the cosine similarity distance, to obtain the most similar concepts, grouping by the hierarchies from Snomed-CT. We publish to the academic world: https://github.com/NachusS/Snomed2Vec access to the public web tool, and the notebook, for develop and test this paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE International Symposium on Computer-Based Medical Systems

自引率

0.00%

发文量