基于统一医疗语料库的大数据医疗分析标准NER标签方案

Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad
{"title":"基于统一医疗语料库的大数据医疗分析标准NER标签方案","authors":"Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad","doi":"10.37965/jait.2022.0127","DOIUrl":null,"url":null,"abstract":"The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of  diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. \nIn our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.","PeriodicalId":70996,"journal":{"name":"人工智能技术学报(英文)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Standard NER Tagging Scheme for Big Data Healthcare Analytics Built on Unified Medical Corpora\",\"authors\":\"Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad\",\"doi\":\"10.37965/jait.2022.0127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of  diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. \\nIn our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.\",\"PeriodicalId\":70996,\"journal\":{\"name\":\"人工智能技术学报(英文)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"人工智能技术学报(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.37965/jait.2022.0127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"人工智能技术学报(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.37965/jait.2022.0127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

这项研究的动机来自于在发现医学背景学习的共同点时发现的差距,通过分析不同目的的诊断,推荐,处方或治疗患者的统一表型特征。本文的作者在寻找医学语境学习的可能解决方案时发现,缺少用医学术语标记的统一语料库来训练医学语境学习的分析。因此,我们展示了一种机制,提出了统一的NER(命名实体识别)标记的医学语料库,该语料库使用14407例诊断为糖尿病和合并症的内分泌患者的CSV格式数据集。另一个语料库是ICD-10-CM编码方案,文本格式取自www.icd10data.com。ICD-10-CM语料库将被标记,以统一地理解医学背景,为此我们正在使用常见的NLP技术和框架进行不同的实验,如;TensorFlow, Keras, LSTM和Bi-LSTM。在我们的初步实验中,虽然(实例,标签)对形式的标签集被标记为在TensorFlow上形成的Sequential()模型。Keras和Bi-LSTM NLP算法。模型验证的最大精度为0.8846。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Standard NER Tagging Scheme for Big Data Healthcare Analytics Built on Unified Medical Corpora
The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of  diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. In our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信