Margrethe Bang Henriksen, Florian Van Daalen, Leonard Wee, Torben Frøstrup Hansen, Lars Henrik Jensen, Claus Lohman Brasen, Ole Hilberg, Inigo Bermejo
{"title":"Lung Cancer Detection Using Bayesian Networks: A Retrospective Development and Validation Study on a Danish Population of High-Risk Individuals.","authors":"Margrethe Bang Henriksen, Florian Van Daalen, Leonard Wee, Torben Frøstrup Hansen, Lars Henrik Jensen, Claus Lohman Brasen, Ole Hilberg, Inigo Bermejo","doi":"10.1002/cam4.70458","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lung cancer (LC) is the top cause of cancer deaths globally, prompting many countries to adopt LC screening programs. While screening typically relies on age and smoking intensity, more efficient risk models exist. We devised a Bayesian network (BN) for LC detection, testing its resilience with varying degrees of missing data and comparing it to a prior machine learning (ML) model.</p><p><strong>Methods: </strong>We analyzed data from 9940 patients referred for LC assessment in Southern Denmark from 2009 to 2018. Variables included age, sex, smoking, and lab results. Our experiments varied missing data (0%-30%), BN structure (expert-based vs. data-driven), and discretization method (standard vs. data-driven).</p><p><strong>Results: </strong>Across all missing data levels, area under the curve (AUC) remained steady, ranging from 0.737 to 0.757, compared to the ML model's AUC of 0.77. BN structure and discretization method had minimal impact on performance. BNs were well calibrated overall, with a net benefit in decision curve analysis when predicted risk exceeded 5%.</p><p><strong>Conclusion: </strong>BN models showed resilience with up to 30% missing values. Moreover, these BNs exhibited similar performance, calibration, and clinical utility compared to the machine learning model developed using the same dataset. Considering their effectiveness in handling missing data, BNs emerge as a relevant method for the development of future lung cancer detection models.</p>","PeriodicalId":139,"journal":{"name":"Cancer Medicine","volume":"14 3","pages":"e70458"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783238/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/cam4.70458","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Lung cancer (LC) is the top cause of cancer deaths globally, prompting many countries to adopt LC screening programs. While screening typically relies on age and smoking intensity, more efficient risk models exist. We devised a Bayesian network (BN) for LC detection, testing its resilience with varying degrees of missing data and comparing it to a prior machine learning (ML) model.
Methods: We analyzed data from 9940 patients referred for LC assessment in Southern Denmark from 2009 to 2018. Variables included age, sex, smoking, and lab results. Our experiments varied missing data (0%-30%), BN structure (expert-based vs. data-driven), and discretization method (standard vs. data-driven).
Results: Across all missing data levels, area under the curve (AUC) remained steady, ranging from 0.737 to 0.757, compared to the ML model's AUC of 0.77. BN structure and discretization method had minimal impact on performance. BNs were well calibrated overall, with a net benefit in decision curve analysis when predicted risk exceeded 5%.
Conclusion: BN models showed resilience with up to 30% missing values. Moreover, these BNs exhibited similar performance, calibration, and clinical utility compared to the machine learning model developed using the same dataset. Considering their effectiveness in handling missing data, BNs emerge as a relevant method for the development of future lung cancer detection models.
期刊介绍:
Cancer Medicine is a peer-reviewed, open access, interdisciplinary journal providing rapid publication of research from global biomedical researchers across the cancer sciences. The journal will consider submissions from all oncologic specialties, including, but not limited to, the following areas:
Clinical Cancer Research
Translational research ∙ clinical trials ∙ chemotherapy ∙ radiation therapy ∙ surgical therapy ∙ clinical observations ∙ clinical guidelines ∙ genetic consultation ∙ ethical considerations
Cancer Biology:
Molecular biology ∙ cellular biology ∙ molecular genetics ∙ genomics ∙ immunology ∙ epigenetics ∙ metabolic studies ∙ proteomics ∙ cytopathology ∙ carcinogenesis ∙ drug discovery and delivery.
Cancer Prevention:
Behavioral science ∙ psychosocial studies ∙ screening ∙ nutrition ∙ epidemiology and prevention ∙ community outreach.
Bioinformatics:
Gene expressions profiles ∙ gene regulation networks ∙ genome bioinformatics ∙ pathwayanalysis ∙ prognostic biomarkers.
Cancer Medicine publishes original research articles, systematic reviews, meta-analyses, and research methods papers, along with invited editorials and commentaries. Original research papers must report well-conducted research with conclusions supported by the data presented in the paper.