Lung nodule classification using radiomics model trained on degraded SDCT images

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2024-10-23 DOI:10.1016/j.cmpb.2024.108474

Jiaying Liu , Anna Corti , Valentina D.A. Corino , Luca Mainardi

{"title":"Lung nodule classification using radiomics model trained on degraded SDCT images","authors":"Jiaying Liu , Anna Corti , Valentina D.A. Corino , Luca Mainardi","doi":"10.1016/j.cmpb.2024.108474","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Low-dose computed tomography (LDCT) screening has shown promise in reducing lung cancer mortality; however, it suffers from high false positive rates and a scarcity of available annotated datasets. To overcome these challenges, we propose a novel approach using synthetic LDCT images generated from standard-dose CT (SDCT) scans from the LIDC-IDRI dataset. Our objective is to develop and validate an interpretable radiomics-based model for distinguishing likely benign from likely malignant pulmonary nodules.</div></div><div><h3>Methods</h3><div>From a total of 1010 CT images (695 SDCTs and 315 LDCTs), we degraded SDCTs in the sinogram domain and obtained 1950 nodules as the training set. The 675 nodules from the LDCTs were stratified into 50%-50% partitions for validation and testing. Radiomic features were extracted from nodules, and three feature sets were assessed using: a) only shape and size (SS) features, b) all features but SS features, and c) all features. A systematic pipeline was developed to optimize the feature set and evaluate multiple machine learning models. Models were trained using degraded SDCT, validated and tested on the LDCT nodules.</div></div><div><h3>Results</h3><div>Training a logistic regression model using three SS features yielded the most promising results, achieving on the test set mean balanced accuracy, sensitivity, specificity, and AUC-ROC scores of 0.81, 0.76, 0.85, and 0.87, respectively.</div></div><div><h3>Conclusions</h3><div>Our study demonstrates the feasibility and effectiveness of using synthetic LDCT images for developing a relatively accurate radiomics-based model in lung nodule classification. This approach addresses challenges associated with LDCT screening, offering potential implications for improving lung cancer detection and reducing false positives.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"257 ","pages":"Article 108474"},"PeriodicalIF":4.9000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016926072400467X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objective

Low-dose computed tomography (LDCT) screening has shown promise in reducing lung cancer mortality; however, it suffers from high false positive rates and a scarcity of available annotated datasets. To overcome these challenges, we propose a novel approach using synthetic LDCT images generated from standard-dose CT (SDCT) scans from the LIDC-IDRI dataset. Our objective is to develop and validate an interpretable radiomics-based model for distinguishing likely benign from likely malignant pulmonary nodules.

Methods

From a total of 1010 CT images (695 SDCTs and 315 LDCTs), we degraded SDCTs in the sinogram domain and obtained 1950 nodules as the training set. The 675 nodules from the LDCTs were stratified into 50%-50% partitions for validation and testing. Radiomic features were extracted from nodules, and three feature sets were assessed using: a) only shape and size (SS) features, b) all features but SS features, and c) all features. A systematic pipeline was developed to optimize the feature set and evaluate multiple machine learning models. Models were trained using degraded SDCT, validated and tested on the LDCT nodules.

Results

Training a logistic regression model using three SS features yielded the most promising results, achieving on the test set mean balanced accuracy, sensitivity, specificity, and AUC-ROC scores of 0.81, 0.76, 0.85, and 0.87, respectively.

Conclusions

Our study demonstrates the feasibility and effectiveness of using synthetic LDCT images for developing a relatively accurate radiomics-based model in lung nodule classification. This approach addresses challenges associated with LDCT screening, offering potential implications for improving lung cancer detection and reducing false positives.

查看原文本刊更多论文

使用在降级 SDCT 图像上训练的放射组学模型进行肺结节分类

背景和目的低剂量计算机断层扫描（LDCT）筛查在降低肺癌死亡率方面大有可为，但它的假阳性率很高，而且缺乏可用的注释数据集。为了克服这些挑战，我们提出了一种新方法，利用 LIDC-IDRI 数据集中的标准剂量 CT（SDCT）扫描生成的合成 LDCT 图像。我们的目标是开发并验证一种基于放射组学的可解释模型，用于区分可能是良性还是恶性的肺部结节。方法从总共 1010 张 CT 图像（695 张 SDCT 和 315 张 LDCT）中，我们对 SDCT 进行了正弦图域降解，获得 1950 个结节作为训练集。来自 LDCT 的 675 个结节被分成 50%-50% 的分区，用于验证和测试。从结节中提取放射学特征，并使用三种特征集进行评估：a) 仅形状和大小（SS）特征；b) 除 SS 特征外的所有特征；c) 所有特征。开发了一个系统管道来优化特征集和评估多个机器学习模型。使用降级 SDCT 对模型进行了训练，并在 LDCT 结节上进行了验证和测试。结果使用三个 SS 特征训练逻辑回归模型取得了最有希望的结果，测试集的平均平衡准确率、灵敏度、特异性和 AUC-ROC 得分分别为 0.81、0.76、0.85 和 0.87。结论我们的研究证明了使用合成 LDCT 图像开发基于放射组学的相对准确的肺结节分类模型的可行性和有效性。这种方法解决了与 LDCT 筛查相关的难题，为改善肺癌检测和减少假阳性提供了潜在的意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.