David Andras, Radu Alexandru Ilies, Victor Esanu, Stefan Agoston, Tudor Florin Marginean Jumate, George Calin Dindelegan
{"title":"Artificial Intelligence as a Potential Tool for Predicting Surgical Margin Status in Early Breast Cancer Using Mammographic Specimen Images.","authors":"David Andras, Radu Alexandru Ilies, Victor Esanu, Stefan Agoston, Tudor Florin Marginean Jumate, George Calin Dindelegan","doi":"10.3390/diagnostics15101276","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives</b>: Breast cancer is the most common malignancy among women globally, with an increasing incidence, particularly in younger populations. Achieving complete surgical excision is essential to reduce recurrence. Artificial intelligence (AI), including large language models like ChatGPT, has potential for supporting diagnostic tasks, though its role in surgical oncology remains limited. <b>Methods</b>: This retrospective study evaluated ChatGPT's performance (ChatGPT-4, OpenAI, March 2025) in predicting surgical margin status (R0 or R1) based on intraoperative mammograms of lumpectomy specimens. AI-generated responses were compared with histopathological findings. Performance was evaluated using sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1 score, and Cohen's kappa coefficient. <b>Results</b>: Out of a total of 100 patients, ChatGPT achieved an accuracy of 84.0% in predicting surgical margin status. Sensitivity for identifying R1 cases (incomplete excision) was 60.0%, while specificity for R0 (complete excision) was 86.7%. The positive predictive value (PPV) was 33.3%, and the negative predictive value (NPV) was 95.1%. The F1 score for R1 classification was 0.43, and Cohen's kappa coefficient was 0.34, indicating moderate agreement with histopathological findings. <b>Conclusions</b>: ChatGPT demonstrated moderate accuracy in confirming complete excision but showed limited reliability in identifying incomplete margins. While promising, these findings emphasize the need for domain-specific training and further validation before such models can be implemented in clinical breast cancer workflows.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 10","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12109882/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15101276","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/Objectives: Breast cancer is the most common malignancy among women globally, with an increasing incidence, particularly in younger populations. Achieving complete surgical excision is essential to reduce recurrence. Artificial intelligence (AI), including large language models like ChatGPT, has potential for supporting diagnostic tasks, though its role in surgical oncology remains limited. Methods: This retrospective study evaluated ChatGPT's performance (ChatGPT-4, OpenAI, March 2025) in predicting surgical margin status (R0 or R1) based on intraoperative mammograms of lumpectomy specimens. AI-generated responses were compared with histopathological findings. Performance was evaluated using sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1 score, and Cohen's kappa coefficient. Results: Out of a total of 100 patients, ChatGPT achieved an accuracy of 84.0% in predicting surgical margin status. Sensitivity for identifying R1 cases (incomplete excision) was 60.0%, while specificity for R0 (complete excision) was 86.7%. The positive predictive value (PPV) was 33.3%, and the negative predictive value (NPV) was 95.1%. The F1 score for R1 classification was 0.43, and Cohen's kappa coefficient was 0.34, indicating moderate agreement with histopathological findings. Conclusions: ChatGPT demonstrated moderate accuracy in confirming complete excision but showed limited reliability in identifying incomplete margins. While promising, these findings emphasize the need for domain-specific training and further validation before such models can be implemented in clinical breast cancer workflows.
DiagnosticsBiochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍:
Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.