C. Bretschneider, S. Zillner, M. Hammon, P. Gass, Daniel Sonntag
{"title":"Automatic Extraction of Breast Cancer Information from Clinical Reports","authors":"C. Bretschneider, S. Zillner, M. Hammon, P. Gass, Daniel Sonntag","doi":"10.1109/CBMS.2017.138","DOIUrl":null,"url":null,"abstract":"The majority of clinical data is only available in unstructured text documents. Thus, their automated usage in data-based clinical application scenarios, like quality assurance and clinical decision support by treatment suggestions, is hindered because it requires high manual annotation efforts. In this work, we introduce a system for the automated processing of clinical reports of mamma carcinoma patients that allows for the automatic extraction and seamless processing of relevant textual features. Its underlying information extraction pipeline employs a rule-based grammar approach that is integrated with semantic technologies to determine the relevant information from the patient record. The accuracy of the system, developed with nine thousand clinical documents, reaches accuracy levels of 90% for lymph node status and 69% for the structurally most complex feature, the hormone status.","PeriodicalId":141105,"journal":{"name":"2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2017.138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The majority of clinical data is only available in unstructured text documents. Thus, their automated usage in data-based clinical application scenarios, like quality assurance and clinical decision support by treatment suggestions, is hindered because it requires high manual annotation efforts. In this work, we introduce a system for the automated processing of clinical reports of mamma carcinoma patients that allows for the automatic extraction and seamless processing of relevant textual features. Its underlying information extraction pipeline employs a rule-based grammar approach that is integrated with semantic technologies to determine the relevant information from the patient record. The accuracy of the system, developed with nine thousand clinical documents, reaches accuracy levels of 90% for lymph node status and 69% for the structurally most complex feature, the hormone status.