Text Report Analysis to Identify Opportunities for Optimizing Target Selection for Chest Radiograph Artificial Intelligence Models

IF 2.9 2区工程技术 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Digital Imaging Pub Date : 2024-01-12 DOI:10.1007/s10278-023-00927-5

Carl Sabottke, Jason Lee, Alan Chiang, Bradley Spieler, Raza Mushtaq

{"title":"Text Report Analysis to Identify Opportunities for Optimizing Target Selection for Chest Radiograph Artificial Intelligence Models","authors":"Carl Sabottke, Jason Lee, Alan Chiang, Bradley Spieler, Raza Mushtaq","doi":"10.1007/s10278-023-00927-5","DOIUrl":null,"url":null,"abstract":"Our goal was to analyze radiology report text for chest radiographs (CXRs) to identify imaging findings that have the most impact on report length and complexity. Identifying these imaging findings can highlight opportunities for designing CXR AI systems which increase radiologist efficiency. We retrospectively analyzed text from 210,025 MIMIC-CXR reports and 168,949 reports from our local institution collected from 2019 to 2022. Fifty-nine categories of imaging finding keywords were extracted from reports using natural language processing (NLP), and their impact on report length was assessed using linear regression with and without LASSO regularization. Regression was also used to assess the impact of additional factors contributing to report length, such as the signing radiologist and use of terms of perception. For modeling CXR report word counts with regression, mean coefficient of determination, R2, was 0.469 ± 0.001 for local reports and 0.354 ± 0.002 for MIMIC-CXR when considering only imaging finding keyword features. Mean R2 was significantly less at 0.067 ± 0.001 for local reports and 0.086 ± 0.002 for MIMIC-CXR, when only considering use of terms of perception. For a combined model for the local report data accounting for the signing radiologist, imaging finding keywords, and terms of perception, the mean R2 was 0.570 ± 0.002. With LASSO, highest value coefficients pertained to endotracheal tubes and pleural drains for local data and masses, nodules, and cavitary and cystic lesions for MIMIC-CXR. Natural language processing and regression analysis of radiology report textual data can highlight imaging targets for AI models which offer opportunities to bolster radiologist efficiency.","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"34 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10278-023-00927-5","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Our goal was to analyze radiology report text for chest radiographs (CXRs) to identify imaging findings that have the most impact on report length and complexity. Identifying these imaging findings can highlight opportunities for designing CXR AI systems which increase radiologist efficiency. We retrospectively analyzed text from 210,025 MIMIC-CXR reports and 168,949 reports from our local institution collected from 2019 to 2022. Fifty-nine categories of imaging finding keywords were extracted from reports using natural language processing (NLP), and their impact on report length was assessed using linear regression with and without LASSO regularization. Regression was also used to assess the impact of additional factors contributing to report length, such as the signing radiologist and use of terms of perception. For modeling CXR report word counts with regression, mean coefficient of determination, R², was 0.469 ± 0.001 for local reports and 0.354 ± 0.002 for MIMIC-CXR when considering only imaging finding keyword features. Mean R² was significantly less at 0.067 ± 0.001 for local reports and 0.086 ± 0.002 for MIMIC-CXR, when only considering use of terms of perception. For a combined model for the local report data accounting for the signing radiologist, imaging finding keywords, and terms of perception, the mean R² was 0.570 ± 0.002. With LASSO, highest value coefficients pertained to endotracheal tubes and pleural drains for local data and masses, nodules, and cavitary and cystic lesions for MIMIC-CXR. Natural language processing and regression analysis of radiology report textual data can highlight imaging targets for AI models which offer opportunities to bolster radiologist efficiency.

Abstract Image

查看原文本刊更多论文

通过文本报告分析确定优化胸片人工智能模型目标选择的机会

我们的目标是分析胸片（CXR）的放射学报告文本，找出对报告长度和复杂性影响最大的成像结果。找出这些成像结果可以突出设计 CXR AI 系统的机会，从而提高放射医师的效率。我们回顾性分析了从 2019 年到 2022 年收集的 210,025 份 MIMIC-CXR 报告和本地机构的 168,949 份报告的文本。我们使用自然语言处理（NLP）技术从报告中提取了59类成像发现关键词，并使用线性回归（带或不带LASSO正则化）评估了它们对报告长度的影响。回归还用于评估其他因素对报告长度的影响，如放射科医生的签名和感知术语的使用。在使用回归法对 CXR 报告字数建模时，仅考虑成像发现关键词特征时，本地报告的平均判定系数 R2 为 0.469 ± 0.001，MIMIC-CXR 的平均判定系数 R2 为 0.354 ± 0.002。仅考虑使用感知术语时，本地报告的平均 R2 为 0.067 ± 0.001，MIMIC-CXR 的平均 R2 为 0.086 ± 0.002。在本地报告数据的综合模型中，考虑到放射科医生签名、成像发现关键词和感知术语，平均 R2 为 0.570 ± 0.002。通过 LASSO，本地数据中气管插管和胸腔引流管的数值系数最高，MIMIC-CXR 中肿块、结节、腔隙性和囊性病变的数值系数最高。对放射学报告文本数据进行自然语言处理和回归分析可为人工智能模型突出成像目标，从而为提高放射科医生的工作效率提供机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Digital Imaging 医学-核医学

CiteScore

7.50

自引率

6.80%

发文量

192

审稿时长

6-12 weeks

期刊介绍： The Journal of Digital Imaging (JDI) is the official peer-reviewed journal of the Society for Imaging Informatics in Medicine (SIIM). JDI’s goal is to enhance the exchange of knowledge encompassed by the general topic of Imaging Informatics in Medicine such as research and practice in clinical, engineering, and information technologies and techniques in all medical imaging environments. JDI topics are of interest to researchers, developers, educators, physicians, and imaging informatics professionals. Suggested Topics PACS and component systems; imaging informatics for the enterprise; image-enabled electronic medical records; RIS and HIS; digital image acquisition; image processing; image data compression; 3D, visualization, and multimedia; speech recognition; computer-aided diagnosis; facilities design; imaging vocabularies and ontologies; Transforming the Radiological Interpretation Process (TRIP™); DICOM and other standards; workflow and process modeling and simulation; quality assurance; archive integrity and security; teleradiology; digital mammography; and radiological informatics education.