Development and validation of a habitat-based computed tomography radiomics model for differentiating isolated lung cancer, isolated tuberculoma, and coexistence of tuberculosis with lung cancer: a dual-center retrospective study.
Ning Shi, Zhenzhen Wan, Limin Wen, Zhenpeng Liu, Bing Wang, Ye Li, Peng Xiong, Dailun Hou, Xiuling Liu
{"title":"Development and validation of a habitat-based computed tomography radiomics model for differentiating isolated lung cancer, isolated tuberculoma, and coexistence of tuberculosis with lung cancer: a dual-center retrospective study.","authors":"Ning Shi, Zhenzhen Wan, Limin Wen, Zhenpeng Liu, Bing Wang, Ye Li, Peng Xiong, Dailun Hou, Xiuling Liu","doi":"10.21037/tlcr-2025-1-1381","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Isolated lung cancer (ILC), isolated tuberculoma, and coexistence of tuberculosis with lung cancer (CTBLC) exhibit similarities in computed tomography (CT) imaging features but great differences in pathology, treatment strategy, and prognosis; therefore, accurate differential diagnosis is critical for clinical management and patient safety. The purpose of this study was to develop and validate a habitat-based CT radiomics model that integrates intralesional subregion features with whole-lesion features for reliable differentiation among these three conditions.</p><p><strong>Methods: </strong>This study retrospectively included 317 patients with ILC, tuberculoma, or CTBLC from 2018 to 2022. Among these, 239 patients from Beijing Chest Hospital, Capital Medical University (Center 1) formed the training and internal test cohorts, and 78 from Infectious Disease Hospital of Heilongjiang Province (Center 2) constituted an external validation cohort. Volumes of interest (VOIs) were manually outlined by two experienced radiologists on CT images. Then each lesion was partitioned into two subregions using K-means clustering. A total of 1,218 three-dimensional whole-lesion radiomics features and 2,436 habitat features were extracted. Feature selection was performed via least absolute shrinkage and selection operator (LASSO). Six classification algorithms were trained and evaluated. To distinguish ILC, tuberculoma, and CTBLC, three models were developed: (I) a traditional radiomics model using only whole-lesion radiomics features; (II) a habitat model based on intralesional habitat features; and (III) a combined habitat-radiomics model fusing both feature sets. Discrimination was assessed using the area under the curve (AUC), and SHapley Additive exPlanations (SHAP) was used to interpret the optimal model and visualize individual prediction decisions.</p><p><strong>Results: </strong>The combined habitat-radiomics model that integrates habitat and whole-lesion features outperformed the traditional radiomics model. Among them, the extreme gradient boosting (XGBoost)-based fusion model achieved the best performance (mean AUC =0.934) in the internal test cohort, surpassing both the radiomics model (mean AUC =0.910) and the habitat model (mean AUC =0.873). For individual classes, the fusion model yielded AUCs of 0.911 (ILC), 0.955 (tuberculoma), and 0.937 (CTBLC). Compared with the interpretations provided by three radiologists, the combined radiomics-habitat model demonstrated better discriminative performance. SHAP plots revealed key features and presented individual visualizations of each prediction.</p><p><strong>Conclusions: </strong>A habitat-based CT radiomics approach that incorporates intralesional subregion features into whole-lesion signatures improves differentiation among ILC, tuberculoma, and CTBLC. This combined model provides a noninvasive tool to support clinical decision-making.</p>","PeriodicalId":23271,"journal":{"name":"Translational lung cancer research","volume":"15 3","pages":"49"},"PeriodicalIF":3.5000,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13071703/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational lung cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tlcr-2025-1-1381","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Isolated lung cancer (ILC), isolated tuberculoma, and coexistence of tuberculosis with lung cancer (CTBLC) exhibit similarities in computed tomography (CT) imaging features but great differences in pathology, treatment strategy, and prognosis; therefore, accurate differential diagnosis is critical for clinical management and patient safety. The purpose of this study was to develop and validate a habitat-based CT radiomics model that integrates intralesional subregion features with whole-lesion features for reliable differentiation among these three conditions.
Methods: This study retrospectively included 317 patients with ILC, tuberculoma, or CTBLC from 2018 to 2022. Among these, 239 patients from Beijing Chest Hospital, Capital Medical University (Center 1) formed the training and internal test cohorts, and 78 from Infectious Disease Hospital of Heilongjiang Province (Center 2) constituted an external validation cohort. Volumes of interest (VOIs) were manually outlined by two experienced radiologists on CT images. Then each lesion was partitioned into two subregions using K-means clustering. A total of 1,218 three-dimensional whole-lesion radiomics features and 2,436 habitat features were extracted. Feature selection was performed via least absolute shrinkage and selection operator (LASSO). Six classification algorithms were trained and evaluated. To distinguish ILC, tuberculoma, and CTBLC, three models were developed: (I) a traditional radiomics model using only whole-lesion radiomics features; (II) a habitat model based on intralesional habitat features; and (III) a combined habitat-radiomics model fusing both feature sets. Discrimination was assessed using the area under the curve (AUC), and SHapley Additive exPlanations (SHAP) was used to interpret the optimal model and visualize individual prediction decisions.
Results: The combined habitat-radiomics model that integrates habitat and whole-lesion features outperformed the traditional radiomics model. Among them, the extreme gradient boosting (XGBoost)-based fusion model achieved the best performance (mean AUC =0.934) in the internal test cohort, surpassing both the radiomics model (mean AUC =0.910) and the habitat model (mean AUC =0.873). For individual classes, the fusion model yielded AUCs of 0.911 (ILC), 0.955 (tuberculoma), and 0.937 (CTBLC). Compared with the interpretations provided by three radiologists, the combined radiomics-habitat model demonstrated better discriminative performance. SHAP plots revealed key features and presented individual visualizations of each prediction.
Conclusions: A habitat-based CT radiomics approach that incorporates intralesional subregion features into whole-lesion signatures improves differentiation among ILC, tuberculoma, and CTBLC. This combined model provides a noninvasive tool to support clinical decision-making.
期刊介绍:
Translational Lung Cancer Research(TLCR, Transl Lung Cancer Res, Print ISSN 2218-6751; Online ISSN 2226-4477) is an international, peer-reviewed, open-access journal, which was founded in March 2012. TLCR is indexed by PubMed/PubMed Central and the Chemical Abstracts Service (CAS) Databases. It is published quarterly the first year, and published bimonthly since February 2013. It provides practical up-to-date information on prevention, early detection, diagnosis, and treatment of lung cancer. Specific areas of its interest include, but not limited to, multimodality therapy, markers, imaging, tumor biology, pathology, chemoprevention, and technical advances related to lung cancer.