Hongchang Wang , Yan Gu , Gao Wu , Yunqiang Yang , Wenhao Zhang , Guang Mu , Wentao Xue , Chenghao Fu , Yang Xia , Liang Chen , Mei Yuan , Jun Wang
{"title":"Combining tumor habitat radiomics and circulating tumor cell data for predicting high-grade pathological components in lung adenocarcinoma","authors":"Hongchang Wang , Yan Gu , Gao Wu , Yunqiang Yang , Wenhao Zhang , Guang Mu , Wentao Xue , Chenghao Fu , Yang Xia , Liang Chen , Mei Yuan , Jun Wang","doi":"10.1016/j.cmpb.2025.108986","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>Lung cancer is the leading cause of cancer incidence and mortality. Early surgical resection significantly improves patient prognosis. Studies have shown that high-grade components in lung adenocarcinoma (LUAD) severely impact patient outcomes. Therefore, early prediction of these high-grade components is crucial for clinical surgical decision-making.</div></div><div><h3>Methods</h3><div>We delineated tumor subregions using k-means clustering and excluded features with low reproducibility by calculating the intraclass correlation coefficient (ICC). Stratified sampling ensured a consistent sample distribution between the training and testing datasets, and Borderline synthetic minority over-sampling technique (BorderlineSMOTE) addressed the data imbalance in the training dataset. Normality tests were conducted, followed by feature selection using independent sample t tests, Mann‒Whitney U tests, and Spearman rank correlation. Principal component analysis (PCA) of reduced dimensionality and model integration were performed using a stacking approach. Model predictive performance was evaluated using the area under the curve (AUC), and significant differences between models were assessed using the DeLong test.</div></div><div><h3>Results</h3><div>The combined Habitat-circulating tumor cell (CTC) model showed the best predictive performance for high-grade components in both the training and validation datasets, achieving AUCs of 0.98 [95 % CI: 0.95–1.00] and 0.91 [95 % CI: 0.82–1.00], respectively. In the training dataset, the combined model's AUC of 0.98 [95 % CI: 0.95–1.00] was notably higher than that of the single CTC model, which achieved an AUC of 0.75 [95 % CI: 0.64–0.85], and the single sub-region model, which had an AUC of 0.94 [95 % CI: 0.88–1.00]. Decision-curve analysis demonstrated maximal net benefit at threshold probabilities of 0.2–0.4. In the independent cohort (n = 29), AUC reached 1.00 [95 % CI: 1.00–1.00].</div></div><div><h3>Conclusion</h3><div>Combining habitat radiomics and CTC-related clinical models allows for more precise prediction of high-grade pathological components, aiding in clinical preoperative decision-making.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"271 ","pages":"Article 108986"},"PeriodicalIF":4.8000,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725004031","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
Lung cancer is the leading cause of cancer incidence and mortality. Early surgical resection significantly improves patient prognosis. Studies have shown that high-grade components in lung adenocarcinoma (LUAD) severely impact patient outcomes. Therefore, early prediction of these high-grade components is crucial for clinical surgical decision-making.
Methods
We delineated tumor subregions using k-means clustering and excluded features with low reproducibility by calculating the intraclass correlation coefficient (ICC). Stratified sampling ensured a consistent sample distribution between the training and testing datasets, and Borderline synthetic minority over-sampling technique (BorderlineSMOTE) addressed the data imbalance in the training dataset. Normality tests were conducted, followed by feature selection using independent sample t tests, Mann‒Whitney U tests, and Spearman rank correlation. Principal component analysis (PCA) of reduced dimensionality and model integration were performed using a stacking approach. Model predictive performance was evaluated using the area under the curve (AUC), and significant differences between models were assessed using the DeLong test.
Results
The combined Habitat-circulating tumor cell (CTC) model showed the best predictive performance for high-grade components in both the training and validation datasets, achieving AUCs of 0.98 [95 % CI: 0.95–1.00] and 0.91 [95 % CI: 0.82–1.00], respectively. In the training dataset, the combined model's AUC of 0.98 [95 % CI: 0.95–1.00] was notably higher than that of the single CTC model, which achieved an AUC of 0.75 [95 % CI: 0.64–0.85], and the single sub-region model, which had an AUC of 0.94 [95 % CI: 0.88–1.00]. Decision-curve analysis demonstrated maximal net benefit at threshold probabilities of 0.2–0.4. In the independent cohort (n = 29), AUC reached 1.00 [95 % CI: 1.00–1.00].
Conclusion
Combining habitat radiomics and CTC-related clinical models allows for more precise prediction of high-grade pathological components, aiding in clinical preoperative decision-making.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.