X. Bao, Shuanglian Xie, Kai Zhang, Kai Song, Yunhaonan Yang
{"title":"Machine Learning Based Information Extraction for Diabetic Nephropathy in Clinical Text Documents","authors":"X. Bao, Shuanglian Xie, Kai Zhang, Kai Song, Yunhaonan Yang","doi":"10.1109/ICSAI48974.2019.9010211","DOIUrl":null,"url":null,"abstract":"Diabetic nephropathy is common complication of diabetes mellitus, it's important to intervene early. For building a predictive model for diabetic nephropathy. In order to extract relevant information as a prediction risk factor, we construct a golden standard corpus. 3422 admission summary notes from 2013 to 2017 in a tertiary hospital were included in the study. An information extraction method based on machine learning models is proposed to extract important information from unstructured medical record texts, in which Adaboost on Duration of Diabetes has best performance (F1=0.97), and Family history of heart disease extraction is most challenge, F1 value of best model result SVM is 0.73. The best performance of the other six types of information extraction model is between 0.85 and 0.96, and the practical application is feasible.","PeriodicalId":270809,"journal":{"name":"2019 6th International Conference on Systems and Informatics (ICSAI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 6th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI48974.2019.9010211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Diabetic nephropathy is common complication of diabetes mellitus, it's important to intervene early. For building a predictive model for diabetic nephropathy. In order to extract relevant information as a prediction risk factor, we construct a golden standard corpus. 3422 admission summary notes from 2013 to 2017 in a tertiary hospital were included in the study. An information extraction method based on machine learning models is proposed to extract important information from unstructured medical record texts, in which Adaboost on Duration of Diabetes has best performance (F1=0.97), and Family history of heart disease extraction is most challenge, F1 value of best model result SVM is 0.73. The best performance of the other six types of information extraction model is between 0.85 and 0.96, and the practical application is feasible.
糖尿病肾病是糖尿病的常见并发症,早期干预十分重要。建立糖尿病肾病的预测模型。为了提取相关信息作为预测风险因素,我们构建了一个黄金标准语料库。某三级医院2013 - 2017年3422份住院总结记录纳入研究。提出了一种基于机器学习模型的信息提取方法,从非结构化病历文本中提取重要信息,其中Adaboost on Duration of Diabetes的提取效果最好(F1=0.97),而Family history of heart disease的提取效果最具挑战性,最佳模型结果SVM的F1值为0.73。其他6种信息提取模型的最佳性能在0.85 ~ 0.96之间,实际应用是可行的。