Carl Sabottke, Jason Lee, Alan Chiang, Bradley Spieler, Raza Mushtaq
{"title":"Text Report Analysis to Identify Opportunities for Optimizing Target Selection for Chest Radiograph Artificial Intelligence Models","authors":"Carl Sabottke, Jason Lee, Alan Chiang, Bradley Spieler, Raza Mushtaq","doi":"10.1007/s10278-023-00927-5","DOIUrl":null,"url":null,"abstract":"<p>Our goal was to analyze radiology report text for chest radiographs (CXRs) to identify imaging findings that have the most impact on report length and complexity. Identifying these imaging findings can highlight opportunities for designing CXR AI systems which increase radiologist efficiency. We retrospectively analyzed text from 210,025 MIMIC-CXR reports and 168,949 reports from our local institution collected from 2019 to 2022. Fifty-nine categories of imaging finding keywords were extracted from reports using natural language processing (NLP), and their impact on report length was assessed using linear regression with and without LASSO regularization. Regression was also used to assess the impact of additional factors contributing to report length, such as the signing radiologist and use of terms of perception. For modeling CXR report word counts with regression, mean coefficient of determination, <i>R</i><sup>2</sup>, was 0.469 ± 0.001 for local reports and 0.354 ± 0.002 for MIMIC-CXR when considering only imaging finding keyword features. Mean <i>R</i><sup>2</sup> was significantly less at 0.067 ± 0.001 for local reports and 0.086 ± 0.002 for MIMIC-CXR, when only considering use of terms of perception. For a combined model for the local report data accounting for the signing radiologist, imaging finding keywords, and terms of perception, the mean <i>R</i><sup>2</sup> was 0.570 ± 0.002. With LASSO, highest value coefficients pertained to endotracheal tubes and pleural drains for local data and masses, nodules, and cavitary and cystic lesions for MIMIC-CXR. Natural language processing and regression analysis of radiology report textual data can highlight imaging targets for AI models which offer opportunities to bolster radiologist efficiency.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"34 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10278-023-00927-5","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Our goal was to analyze radiology report text for chest radiographs (CXRs) to identify imaging findings that have the most impact on report length and complexity. Identifying these imaging findings can highlight opportunities for designing CXR AI systems which increase radiologist efficiency. We retrospectively analyzed text from 210,025 MIMIC-CXR reports and 168,949 reports from our local institution collected from 2019 to 2022. Fifty-nine categories of imaging finding keywords were extracted from reports using natural language processing (NLP), and their impact on report length was assessed using linear regression with and without LASSO regularization. Regression was also used to assess the impact of additional factors contributing to report length, such as the signing radiologist and use of terms of perception. For modeling CXR report word counts with regression, mean coefficient of determination, R2, was 0.469 ± 0.001 for local reports and 0.354 ± 0.002 for MIMIC-CXR when considering only imaging finding keyword features. Mean R2 was significantly less at 0.067 ± 0.001 for local reports and 0.086 ± 0.002 for MIMIC-CXR, when only considering use of terms of perception. For a combined model for the local report data accounting for the signing radiologist, imaging finding keywords, and terms of perception, the mean R2 was 0.570 ± 0.002. With LASSO, highest value coefficients pertained to endotracheal tubes and pleural drains for local data and masses, nodules, and cavitary and cystic lesions for MIMIC-CXR. Natural language processing and regression analysis of radiology report textual data can highlight imaging targets for AI models which offer opportunities to bolster radiologist efficiency.
期刊介绍:
The Journal of Digital Imaging (JDI) is the official peer-reviewed journal of the Society for Imaging Informatics in Medicine (SIIM). JDI’s goal is to enhance the exchange of knowledge encompassed by the general topic of Imaging Informatics in Medicine such as research and practice in clinical, engineering, and information technologies and techniques in all medical imaging environments. JDI topics are of interest to researchers, developers, educators, physicians, and imaging informatics professionals.
Suggested Topics
PACS and component systems; imaging informatics for the enterprise; image-enabled electronic medical records; RIS and HIS; digital image acquisition; image processing; image data compression; 3D, visualization, and multimedia; speech recognition; computer-aided diagnosis; facilities design; imaging vocabularies and ontologies; Transforming the Radiological Interpretation Process (TRIP™); DICOM and other standards; workflow and process modeling and simulation; quality assurance; archive integrity and security; teleradiology; digital mammography; and radiological informatics education.