基于机器学习的糖尿病肾病临床文献信息提取

2019 6th International Conference on Systems and Informatics (ICSAI) Pub Date : 2019-11-01 DOI:10.1109/ICSAI48974.2019.9010211

X. Bao, Shuanglian Xie, Kai Zhang, Kai Song, Yunhaonan Yang

{"title":"基于机器学习的糖尿病肾病临床文献信息提取","authors":"X. Bao, Shuanglian Xie, Kai Zhang, Kai Song, Yunhaonan Yang","doi":"10.1109/ICSAI48974.2019.9010211","DOIUrl":null,"url":null,"abstract":"Diabetic nephropathy is common complication of diabetes mellitus, it's important to intervene early. For building a predictive model for diabetic nephropathy. In order to extract relevant information as a prediction risk factor, we construct a golden standard corpus. 3422 admission summary notes from 2013 to 2017 in a tertiary hospital were included in the study. An information extraction method based on machine learning models is proposed to extract important information from unstructured medical record texts, in which Adaboost on Duration of Diabetes has best performance (F1=0.97), and Family history of heart disease extraction is most challenge, F1 value of best model result SVM is 0.73. The best performance of the other six types of information extraction model is between 0.85 and 0.96, and the practical application is feasible.","PeriodicalId":270809,"journal":{"name":"2019 6th International Conference on Systems and Informatics (ICSAI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Machine Learning Based Information Extraction for Diabetic Nephropathy in Clinical Text Documents\",\"authors\":\"X. Bao, Shuanglian Xie, Kai Zhang, Kai Song, Yunhaonan Yang\",\"doi\":\"10.1109/ICSAI48974.2019.9010211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetic nephropathy is common complication of diabetes mellitus, it's important to intervene early. For building a predictive model for diabetic nephropathy. In order to extract relevant information as a prediction risk factor, we construct a golden standard corpus. 3422 admission summary notes from 2013 to 2017 in a tertiary hospital were included in the study. An information extraction method based on machine learning models is proposed to extract important information from unstructured medical record texts, in which Adaboost on Duration of Diabetes has best performance (F1=0.97), and Family history of heart disease extraction is most challenge, F1 value of best model result SVM is 0.73. The best performance of the other six types of information extraction model is between 0.85 and 0.96, and the practical application is feasible.\",\"PeriodicalId\":270809,\"journal\":{\"name\":\"2019 6th International Conference on Systems and Informatics (ICSAI)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 6th International Conference on Systems and Informatics (ICSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI48974.2019.9010211\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 6th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI48974.2019.9010211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

糖尿病肾病是糖尿病的常见并发症，早期干预十分重要。建立糖尿病肾病的预测模型。为了提取相关信息作为预测风险因素，我们构建了一个黄金标准语料库。某三级医院2013 - 2017年3422份住院总结记录纳入研究。提出了一种基于机器学习模型的信息提取方法，从非结构化病历文本中提取重要信息，其中Adaboost on Duration of Diabetes的提取效果最好(F1=0.97)，而Family history of heart disease的提取效果最具挑战性，最佳模型结果SVM的F1值为0.73。其他6种信息提取模型的最佳性能在0.85 ~ 0.96之间，实际应用是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine Learning Based Information Extraction for Diabetic Nephropathy in Clinical Text Documents

Diabetic nephropathy is common complication of diabetes mellitus, it's important to intervene early. For building a predictive model for diabetic nephropathy. In order to extract relevant information as a prediction risk factor, we construct a golden standard corpus. 3422 admission summary notes from 2013 to 2017 in a tertiary hospital were included in the study. An information extraction method based on machine learning models is proposed to extract important information from unstructured medical record texts, in which Adaboost on Duration of Diabetes has best performance (F1=0.97), and Family history of heart disease extraction is most challenge, F1 value of best model result SVM is 0.73. The best performance of the other six types of information extraction model is between 0.85 and 0.96, and the practical application is feasible.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 6th International Conference on Systems and Informatics (ICSAI)

自引率

0.00%

发文量