I. Lazić, N. Jakovljević, J. Boban, I. Nosek, T. Lončar-Turukalo
{"title":"Information extraction from clinical records: an example for breast cancer*","authors":"I. Lazić, N. Jakovljević, J. Boban, I. Nosek, T. Lončar-Turukalo","doi":"10.1109/MELECON53508.2022.9842995","DOIUrl":null,"url":null,"abstract":"The extraction of relevant information from electronic health records (EHR) can facilitate large scale clinical studies related to certain diseases to uncover diversity of their biological and clinical signatures, and patterns of treatment and prognosis. Variety of EHR formats and use of clinical narrative present significant challenges to this task. In this work we describe a process of an automated information extraction from an oncology hospital clinical reports related to 2966 subjects with suspected or confirmed breast cancer. The lack of open medical term dictionaries for the Serbian language and the variety of clinical data types required, imply the use of rule-based approaches with exact matches, regular expressions, hierarchical rules and customized mini dictionaries to analyze clinical text. The accuracy of the applied approach has been validated on manually extracted clinical data of 50 breast cancer patients. The accuracy varied, field dependent, between 71.3% to 100%, indicating that certain relevant fields can be successfully captured, yet implying the need for sophisticated natural language processing tools for accurate extraction of more descriptive features.","PeriodicalId":303656,"journal":{"name":"2022 IEEE 21st Mediterranean Electrotechnical Conference (MELECON)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 21st Mediterranean Electrotechnical Conference (MELECON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MELECON53508.2022.9842995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The extraction of relevant information from electronic health records (EHR) can facilitate large scale clinical studies related to certain diseases to uncover diversity of their biological and clinical signatures, and patterns of treatment and prognosis. Variety of EHR formats and use of clinical narrative present significant challenges to this task. In this work we describe a process of an automated information extraction from an oncology hospital clinical reports related to 2966 subjects with suspected or confirmed breast cancer. The lack of open medical term dictionaries for the Serbian language and the variety of clinical data types required, imply the use of rule-based approaches with exact matches, regular expressions, hierarchical rules and customized mini dictionaries to analyze clinical text. The accuracy of the applied approach has been validated on manually extracted clinical data of 50 breast cancer patients. The accuracy varied, field dependent, between 71.3% to 100%, indicating that certain relevant fields can be successfully captured, yet implying the need for sophisticated natural language processing tools for accurate extraction of more descriptive features.