放射学文本分析系统(RadText):架构和评估。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics Pub Date : 2022-06-01 Epub Date: 2022-09-08 DOI:10.1109/ichi54592.2022.00050

Song Wang, Mingquan Lin, Ying Ding, George Shih, Zhiyong Lu, Yifan Peng

{"title":"放射学文本分析系统(RadText):架构和评估。","authors":"Song Wang, Mingquan Lin, Ying Ding, George Shih, Zhiyong Lu, Yifan Peng","doi":"10.1109/ichi54592.2022.00050","DOIUrl":null,"url":null,"abstract":"Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, a high-performance open-source Python radiology text analysis system. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. Superior to existing widely used toolkits, RadText features a hybrid text processing schema, supports raw text processing and local processing, which enables higher accuracy, better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the output into a structured representation that is compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels that we annotated for this work. RadText demonstrates highly accurate classification performances, with a 0.91 average precision, 0.94 average recall and 0.92 average F-1 score. We also annotated a test set for the five new disease labels to facilitate future research or applications. We have made our code, documentations, examples and the test set available at https://github.com/bionlplab/radtext.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":" ","pages":"288-296"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484781/pdf/nihms-1836549.pdf","citationCount":"3","resultStr":"{\"title\":\"Radiology Text Analysis System (RadText): Architecture and Evaluation.\",\"authors\":\"Song Wang, Mingquan Lin, Ying Ding, George Shih, Zhiyong Lu, Yifan Peng\",\"doi\":\"10.1109/ichi54592.2022.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, a high-performance open-source Python radiology text analysis system. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. Superior to existing widely used toolkits, RadText features a hybrid text processing schema, supports raw text processing and local processing, which enables higher accuracy, better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the output into a structured representation that is compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels that we annotated for this work. RadText demonstrates highly accurate classification performances, with a 0.91 average precision, 0.94 average recall and 0.92 average F-1 score. We also annotated a test set for the five new disease labels to facilitate future research or applications. We have made our code, documentations, examples and the test set available at https://github.com/bionlplab/radtext.\",\"PeriodicalId\":73284,\"journal\":{\"name\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"volume\":\" \",\"pages\":\"288-296\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484781/pdf/nihms-1836549.pdf\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ichi54592.2022.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/9/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi54592.2022.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/8 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

分析放射学报告是一项耗时且容易出错的任务，因此需要一个高效的自动化放射学报告分析系统，以减轻放射科医生的工作量并鼓励精确诊断。在这项工作中，我们提出了RadText，一个高性能的开源Python放射学文本分析系统。RadText提供了一个易于使用的文本分析管道，包括去识别、部分分割、句子分割和单词标记化、命名实体识别、解析和否定检测。优于现有广泛使用的工具包，RadText具有混合文本处理模式，支持原始文本处理和本地处理，从而实现更高的准确性，更好的可用性和改进的数据隐私。RadText采用BioC作为统一接口，并将输出标准化为与观察性医疗结果合作伙伴关系(OMOP)公共数据模型(CDM)兼容的结构化表示，该模型允许采用更系统的方法跨多个不同数据源进行观察性研究。我们在MIMIC-CXR数据集上评估了RadText，我们为这项工作注释了五个新的疾病标签。RadText显示出高度准确的分类性能，平均精度为0.91，平均召回率为0.94，平均F-1得分为0.92。我们还为五种新的疾病标签标注了一个测试集，以方便未来的研究或应用。我们已经在https://github.com/bionlplab/radtext上提供了代码、文档、示例和测试集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Radiology Text Analysis System (RadText): Architecture and Evaluation.

查看原文本刊更多论文

Radiology Text Analysis System (RadText): Architecture and Evaluation.

Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, a high-performance open-source Python radiology text analysis system. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. Superior to existing widely used toolkits, RadText features a hybrid text processing schema, supports raw text processing and local processing, which enables higher accuracy, better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the output into a structured representation that is compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), which allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels that we annotated for this work. RadText demonstrates highly accurate classification performances, with a 0.91 average precision, 0.94 average recall and 0.92 average F-1 score. We also annotated a test set for the five new disease labels to facilitate future research or applications. We have made our code, documentations, examples and the test set available at https://github.com/bionlplab/radtext.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

自引率

0.00%

发文量