Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad
{"title":"Standard NER Tagging Scheme for Big Data Healthcare Analytics Built on Unified Medical Corpora","authors":"Sarah Shafqat, Hammad Majeed, Qaisar Javaid, H. F. Ahmad","doi":"10.37965/jait.2022.0127","DOIUrl":null,"url":null,"abstract":"The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM. \nIn our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.","PeriodicalId":70996,"journal":{"name":"人工智能技术学报(英文)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"人工智能技术学报(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.37965/jait.2022.0127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing, recommending, prescribing or treating patients for uniform phenotype features from patients’ profile. Authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ dataset in CSV format diagnosed with DM and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common NLP techniques and frameworks like; TensorFlow, Keras, LSTM, and Bi-LSTM.
In our preliminary experiments albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.