Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms.

IF 1.3 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Methods of Information in Medicine Pub Date : 2025-05-09 DOI:10.1055/a-2590-6456

Spencer Krichevsky, Evan T Sholle, Prakash M Adekkanattu, Sajjad Abedian, Madhu Ouseph, Elwood Taylor, Ghaith Abu-Zeinah, Diana Jaber, Claudia Sosner, Marika M Cusick, Niamh Savage, Richard T Silver, Joseph M Scandura, Thomas R Campion

{"title":"Automated Information Extraction from Unstructured Hematopathology Reports to Support Response Assessment in Myeloproliferative Neoplasms.","authors":"Spencer Krichevsky, Evan T Sholle, Prakash M Adekkanattu, Sajjad Abedian, Madhu Ouseph, Elwood Taylor, Ghaith Abu-Zeinah, Diana Jaber, Claudia Sosner, Marika M Cusick, Niamh Savage, Richard T Silver, Joseph M Scandura, Thomas R Campion","doi":"10.1055/a-2590-6456","DOIUrl":null,"url":null,"abstract":"Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation, and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.Compared with a reference standard (n = 300 reports), our NLP method extracted features such as aspirate myeloblasts (F1 = 98%) and biopsy reticulin fibrosis (F1 = 93%) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1 = 6%) and via flow cytometry (F1 = 8%), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 90%. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2590-6456","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Assessing treatment response in patients with myeloproliferative neoplasms is difficult because data components exist in unstructured bone marrow pathology (hematopathology) reports, which require specialized, manual annotation, and interpretation. Although natural language processing (NLP) has been successfully implemented for the extraction of features from solid tumor reports, little is known about its application to hematopathology.An open-source NLP framework called Leo was implemented to parse document segments and extract concept phrases utilized for assessing responses in myeloproliferative neoplasms. A reference standard was generated through the manual review of hematopathology notes.Compared with a reference standard (n = 300 reports), our NLP method extracted features such as aspirate myeloblasts (F1 = 98%) and biopsy reticulin fibrosis (F1 = 93%) with high accuracy. However, other values, such as myeloblasts from the biopsy (F1 = 6%) and via flow cytometry (F1 = 8%), were affected by sparsity representative of reporting conventions. The four features with the highest clinical importance were extracted with F1 scores exceeding 90%. Whereas manual annotation of 300 reports required 30 hours of staff effort, automated NLP required 3.5 hours of runtime for 34,301 reports.To the best of our knowledge, this is among the first studies to demonstrate the application of NLP to hematopathology for clinical feature extraction. The approach may inform efforts at other institutions, and the code is available at https://github.com/wcmc-research-informatics/BmrExtractor.

查看原文本刊更多论文

从非结构化血液病报告中自动提取信息以支持骨髓增生性肿瘤的反应评估。

评估骨髓增生性肿瘤患者的治疗反应是困难的，因为数据成分存在于非结构化的骨髓病理学（血液病理学）报告中，这些报告需要专门的手工注释和解释。虽然自然语言处理（NLP）已经成功地用于实体肿瘤报告的特征提取，但其在血液病理学中的应用尚不清楚。实现了一个名为Leo的开源NLP框架，用于解析文档片段并提取用于评估骨髓增殖性肿瘤反应的概念短语。参考标准是通过手工检查血液病记录生成的。与参考标准（n = 300份报告）相比，我们的NLP方法提取吸出性成髓细胞（F1 = 98%）和活检网状蛋白纤维化（F1 = 93%）等特征的准确性较高。然而，其他值，如来自活检（F1 = 6%）和流式细胞术（F1 = 8%）的成髓细胞，受到报告惯例的稀疏性代表的影响。提取临床重要性最高的4个特征，F1评分超过90%。手动注释300个报告需要30小时的工作时间，而自动NLP需要3.5小时的运行时间来处理34,301个报告。据我们所知，这是第一个将自然语言处理应用于血液病理学临床特征提取的研究。该方法可以为其他机构的工作提供信息，代码可在https://github.com/wcmc-research-informatics/BmrExtractor上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods of Information in Medicine 医学-计算机：信息系统

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.