Asim Abbas, Jamil Hussain, Muhammad Afzal, H. M. Bilal, Sungyoung Lee, Seokhee Jeon
{"title":"Explicit and Implicit Section Identification from Clinical Discharge Summaries","authors":"Asim Abbas, Jamil Hussain, Muhammad Afzal, H. M. Bilal, Sungyoung Lee, Seokhee Jeon","doi":"10.1109/IMCOM53663.2022.9721771","DOIUrl":null,"url":null,"abstract":"In the clinical domain, mostly data is generated in natural language and unstructured format in clinical notes, containing meaningful and hidden information. Various algorithms have been proposed to recognize and identify different sections within those clinical notes for easy conversion into a structured data format for further processing in terms of data storage and retrieval. The algorithm has proposed recognizing and identifying the explicit and implicit defined section heading and the start and end of boundaries for identified sections to enhance clinical notes’ information extraction (IE). A section dictionary is constructed contain explicit define section name. An exact term matching approach is used to identify explicit define section and term partially matching procedure is followed utilizing Levenshtein Distance algorithm. We evaluated our discharge summaries provided by Beth Israel Deaconess Medical Center 2010 I2b2 Challenge. The experiments showed that the proposed algorithm achieved a satisfactory score of precision 100%, recall 94.5%, and f-score 97.17% overall. We also perform experiments for the explicit section with a result score of 100% precision, 93.53% recall, and 96.66% f-score, and for the implicitly identified section, we gain precision of 100%, recall 95.46%, and 97.68%. The main goal of proposing this algorithm is to automatically prepare data for data-driven approaches like machine learning and deep learning and enhance the meaningful use of EHR’s and CDSS systems, clinical outcomes, and events.","PeriodicalId":367038,"journal":{"name":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM53663.2022.9721771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In the clinical domain, mostly data is generated in natural language and unstructured format in clinical notes, containing meaningful and hidden information. Various algorithms have been proposed to recognize and identify different sections within those clinical notes for easy conversion into a structured data format for further processing in terms of data storage and retrieval. The algorithm has proposed recognizing and identifying the explicit and implicit defined section heading and the start and end of boundaries for identified sections to enhance clinical notes’ information extraction (IE). A section dictionary is constructed contain explicit define section name. An exact term matching approach is used to identify explicit define section and term partially matching procedure is followed utilizing Levenshtein Distance algorithm. We evaluated our discharge summaries provided by Beth Israel Deaconess Medical Center 2010 I2b2 Challenge. The experiments showed that the proposed algorithm achieved a satisfactory score of precision 100%, recall 94.5%, and f-score 97.17% overall. We also perform experiments for the explicit section with a result score of 100% precision, 93.53% recall, and 96.66% f-score, and for the implicitly identified section, we gain precision of 100%, recall 95.46%, and 97.68%. The main goal of proposing this algorithm is to automatically prepare data for data-driven approaches like machine learning and deep learning and enhance the meaningful use of EHR’s and CDSS systems, clinical outcomes, and events.