{"title":"基于机器学习辅助诊断肺结节良恶性的回顾性研究。","authors":"Wanling Wang, Bingqing Yang, Huan Wu, Hebin Che, Yue Tong, Bozun Zhang, Hongwu Liu, Yuanyuan Chen","doi":"10.2147/JMDH.S518166","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lung cancer, one of the most lethal malignancies globally, often presents insidiously as pulmonary nodules. Its nonspecific clinical presentation and heterogeneous imaging characteristics hinder accurate differentiation between benign and malignant lesions, while biopsy's invasiveness and procedural constraints underscore the critical need for non-invasive early diagnostic approaches.</p><p><strong>Methods: </strong>In this retrospective study, we analyzed outpatient and inpatient records from the First Medical Center of Chinese PLA General Hospital between 2011 and 2021, focusing on pulmonary nodules measuring 5-30mm on CT scans without overt signs of malignancy. Pathological examination served as the reference standard. Comparative experiments evaluated SVM, RF, XGBoost, FNN, and Atten_FNN using five-fold cross-validation to assess AUC, sensitivity, and specificity. The dataset was split 70%/30%, and stratified five-fold cross-validation was applied to the training set. The optimal model was interpreted with SHAP to identify the most influential predictive features.</p><p><strong>Results: </strong>This study enrolled 3355 patients, including 1156 with benign and 2199 with malignant pulmonary nodules. The Atten_FNN model demonstrated superior performance in five-fold cross-validation, achieving an AUC of 0.82, accuracy of 0.75, sensitivity of 0.77, and F1 score of 0.80. SHAP analysis revealed key predictive factors: demographic variables (age, sex, BMI), CT-derived features (maximum nodule diameter, morphology, density, calcification, ground-glass opacity), and laboratory biomarkers (neuroendocrine markers, carcinoembryonic antigen).</p><p><strong>Conclusion: </strong>This study integrates electronic medical records and pathology data to predict pulmonary nodule malignancy using machine/deep learning models. SHAP-based interpretability analysis uncovered key clinical determinants. Acknowledging limitations in cross-center generalizability, we propose the development of a multimodal diagnostic systems that combines CT imaging and radiomics, to be validated in multi-center prospective cohorts to facilitate clinical translation. This framework establishes a novel paradigm for early precision diagnosis of lung cancer.</p>","PeriodicalId":16357,"journal":{"name":"Journal of Multidisciplinary Healthcare","volume":"18 ","pages":"3735-3748"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12212436/pdf/","citationCount":"0","resultStr":"{\"title\":\"Auxiliary Diagnosis of Pulmonary Nodules' Benignancy and Malignancy Based on Machine Learning: A Retrospective Study.\",\"authors\":\"Wanling Wang, Bingqing Yang, Huan Wu, Hebin Che, Yue Tong, Bozun Zhang, Hongwu Liu, Yuanyuan Chen\",\"doi\":\"10.2147/JMDH.S518166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Lung cancer, one of the most lethal malignancies globally, often presents insidiously as pulmonary nodules. Its nonspecific clinical presentation and heterogeneous imaging characteristics hinder accurate differentiation between benign and malignant lesions, while biopsy's invasiveness and procedural constraints underscore the critical need for non-invasive early diagnostic approaches.</p><p><strong>Methods: </strong>In this retrospective study, we analyzed outpatient and inpatient records from the First Medical Center of Chinese PLA General Hospital between 2011 and 2021, focusing on pulmonary nodules measuring 5-30mm on CT scans without overt signs of malignancy. Pathological examination served as the reference standard. Comparative experiments evaluated SVM, RF, XGBoost, FNN, and Atten_FNN using five-fold cross-validation to assess AUC, sensitivity, and specificity. The dataset was split 70%/30%, and stratified five-fold cross-validation was applied to the training set. The optimal model was interpreted with SHAP to identify the most influential predictive features.</p><p><strong>Results: </strong>This study enrolled 3355 patients, including 1156 with benign and 2199 with malignant pulmonary nodules. The Atten_FNN model demonstrated superior performance in five-fold cross-validation, achieving an AUC of 0.82, accuracy of 0.75, sensitivity of 0.77, and F1 score of 0.80. SHAP analysis revealed key predictive factors: demographic variables (age, sex, BMI), CT-derived features (maximum nodule diameter, morphology, density, calcification, ground-glass opacity), and laboratory biomarkers (neuroendocrine markers, carcinoembryonic antigen).</p><p><strong>Conclusion: </strong>This study integrates electronic medical records and pathology data to predict pulmonary nodule malignancy using machine/deep learning models. SHAP-based interpretability analysis uncovered key clinical determinants. Acknowledging limitations in cross-center generalizability, we propose the development of a multimodal diagnostic systems that combines CT imaging and radiomics, to be validated in multi-center prospective cohorts to facilitate clinical translation. This framework establishes a novel paradigm for early precision diagnosis of lung cancer.</p>\",\"PeriodicalId\":16357,\"journal\":{\"name\":\"Journal of Multidisciplinary Healthcare\",\"volume\":\"18 \",\"pages\":\"3735-3748\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12212436/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Multidisciplinary Healthcare\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2147/JMDH.S518166\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multidisciplinary Healthcare","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/JMDH.S518166","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Auxiliary Diagnosis of Pulmonary Nodules' Benignancy and Malignancy Based on Machine Learning: A Retrospective Study.
Background: Lung cancer, one of the most lethal malignancies globally, often presents insidiously as pulmonary nodules. Its nonspecific clinical presentation and heterogeneous imaging characteristics hinder accurate differentiation between benign and malignant lesions, while biopsy's invasiveness and procedural constraints underscore the critical need for non-invasive early diagnostic approaches.
Methods: In this retrospective study, we analyzed outpatient and inpatient records from the First Medical Center of Chinese PLA General Hospital between 2011 and 2021, focusing on pulmonary nodules measuring 5-30mm on CT scans without overt signs of malignancy. Pathological examination served as the reference standard. Comparative experiments evaluated SVM, RF, XGBoost, FNN, and Atten_FNN using five-fold cross-validation to assess AUC, sensitivity, and specificity. The dataset was split 70%/30%, and stratified five-fold cross-validation was applied to the training set. The optimal model was interpreted with SHAP to identify the most influential predictive features.
Results: This study enrolled 3355 patients, including 1156 with benign and 2199 with malignant pulmonary nodules. The Atten_FNN model demonstrated superior performance in five-fold cross-validation, achieving an AUC of 0.82, accuracy of 0.75, sensitivity of 0.77, and F1 score of 0.80. SHAP analysis revealed key predictive factors: demographic variables (age, sex, BMI), CT-derived features (maximum nodule diameter, morphology, density, calcification, ground-glass opacity), and laboratory biomarkers (neuroendocrine markers, carcinoembryonic antigen).
Conclusion: This study integrates electronic medical records and pathology data to predict pulmonary nodule malignancy using machine/deep learning models. SHAP-based interpretability analysis uncovered key clinical determinants. Acknowledging limitations in cross-center generalizability, we propose the development of a multimodal diagnostic systems that combines CT imaging and radiomics, to be validated in multi-center prospective cohorts to facilitate clinical translation. This framework establishes a novel paradigm for early precision diagnosis of lung cancer.
期刊介绍:
The Journal of Multidisciplinary Healthcare (JMDH) aims to represent and publish research in healthcare areas delivered by practitioners of different disciplines. This includes studies and reviews conducted by multidisciplinary teams as well as research which evaluates or reports the results or conduct of such teams or healthcare processes in general. The journal covers a very wide range of areas and we welcome submissions from practitioners at all levels and from all over the world. Good healthcare is not bounded by person, place or time and the journal aims to reflect this. The JMDH is published as an open-access journal to allow this wide range of practical, patient relevant research to be immediately available to practitioners who can access and use it immediately upon publication.