Explicit and Implicit Section Identification from Clinical Discharge Summaries

2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM) Pub Date : 2022-01-03 DOI:10.1109/IMCOM53663.2022.9721771

Asim Abbas, Jamil Hussain, Muhammad Afzal, H. M. Bilal, Sungyoung Lee, Seokhee Jeon

{"title":"Explicit and Implicit Section Identification from Clinical Discharge Summaries","authors":"Asim Abbas, Jamil Hussain, Muhammad Afzal, H. M. Bilal, Sungyoung Lee, Seokhee Jeon","doi":"10.1109/IMCOM53663.2022.9721771","DOIUrl":null,"url":null,"abstract":"In the clinical domain, mostly data is generated in natural language and unstructured format in clinical notes, containing meaningful and hidden information. Various algorithms have been proposed to recognize and identify different sections within those clinical notes for easy conversion into a structured data format for further processing in terms of data storage and retrieval. The algorithm has proposed recognizing and identifying the explicit and implicit defined section heading and the start and end of boundaries for identified sections to enhance clinical notes’ information extraction (IE). A section dictionary is constructed contain explicit define section name. An exact term matching approach is used to identify explicit define section and term partially matching procedure is followed utilizing Levenshtein Distance algorithm. We evaluated our discharge summaries provided by Beth Israel Deaconess Medical Center 2010 I2b2 Challenge. The experiments showed that the proposed algorithm achieved a satisfactory score of precision 100%, recall 94.5%, and f-score 97.17% overall. We also perform experiments for the explicit section with a result score of 100% precision, 93.53% recall, and 96.66% f-score, and for the implicitly identified section, we gain precision of 100%, recall 95.46%, and 97.68%. The main goal of proposing this algorithm is to automatically prepare data for data-driven approaches like machine learning and deep learning and enhance the meaningful use of EHR’s and CDSS systems, clinical outcomes, and events.","PeriodicalId":367038,"journal":{"name":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM53663.2022.9721771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In the clinical domain, mostly data is generated in natural language and unstructured format in clinical notes, containing meaningful and hidden information. Various algorithms have been proposed to recognize and identify different sections within those clinical notes for easy conversion into a structured data format for further processing in terms of data storage and retrieval. The algorithm has proposed recognizing and identifying the explicit and implicit defined section heading and the start and end of boundaries for identified sections to enhance clinical notes’ information extraction (IE). A section dictionary is constructed contain explicit define section name. An exact term matching approach is used to identify explicit define section and term partially matching procedure is followed utilizing Levenshtein Distance algorithm. We evaluated our discharge summaries provided by Beth Israel Deaconess Medical Center 2010 I2b2 Challenge. The experiments showed that the proposed algorithm achieved a satisfactory score of precision 100%, recall 94.5%, and f-score 97.17% overall. We also perform experiments for the explicit section with a result score of 100% precision, 93.53% recall, and 96.66% f-score, and for the implicitly identified section, we gain precision of 100%, recall 95.46%, and 97.68%. The main goal of proposing this algorithm is to automatically prepare data for data-driven approaches like machine learning and deep learning and enhance the meaningful use of EHR’s and CDSS systems, clinical outcomes, and events.

查看原文本刊更多论文

临床出院摘要的显性和隐性剖面图识别

在临床领域，数据大多以自然语言和非结构化格式在临床笔记中生成，包含有意义和隐藏的信息。已经提出了各种算法来识别和识别这些临床记录中的不同部分，以便于将其转换为结构化数据格式，以便在数据存储和检索方面进行进一步处理。该算法提出了识别和识别明确和隐式定义的章节标题以及已识别章节的起始和结束边界，以增强临床笔记的信息提取。构造一个包含显式定义的节名的节字典。采用精确词匹配方法识别显式定义段，利用Levenshtein距离算法进行词部分匹配。我们评估了贝斯以色列女执事医疗中心2010 I2b2挑战赛提供的出院总结。实验表明，该算法达到了令人满意的精度100%，召回率94.5%，总体f值97.17%。我们还对显式部分进行了实验，结果准确率为100%，召回率为93.53%，f分数为96.66%，对于隐式识别部分，我们获得了100%的准确率，召回率为95.46%，97.68%。提出该算法的主要目标是为数据驱动的方法(如机器学习和深度学习)自动准备数据，并增强EHR和CDSS系统、临床结果和事件的有意义使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)

自引率

0.00%

发文量