Rami Elmorsi, Luis D Camacho, David D Krijgh, Heather Lyu, Margaret S Roubaud, Keila Torres, Valerae Lewis, Christina L Roland, Alexander F Mericli
{"title":"Extremity Soft Tissue Sarcoma Reconstruction Nomograms: A Clinicoradiomic, Machine Learning-Powered Predictor of Postoperative Outcomes.","authors":"Rami Elmorsi, Luis D Camacho, David D Krijgh, Heather Lyu, Margaret S Roubaud, Keila Torres, Valerae Lewis, Christina L Roland, Alexander F Mericli","doi":"10.1200/CCI-25-00007","DOIUrl":"https://doi.org/10.1200/CCI-25-00007","url":null,"abstract":"<p><strong>Purpose: </strong>The choice of wound closure modality after limb-sparing extremity soft-tissue sarcoma (eSTS) resection is fraught with uncertainty. Leveraging machine learning and clinicoradiomic data, we developed Sarcoma Reconstruction Nomograms (SARCON), a tool that provides probabilistic estimates of five adverse outcomes on the basis of the selected reconstructive modality.</p><p><strong>Methods: </strong>This retrospective cohort study of limb-sparing eSTS resections integrated clinical variables and radiomic features, including eSTS and limb dimensions. Target outcomes included surgical site infections (SSI), wound dehiscence (WD), seroma formation, and minor and major complications. For each outcome, three machine learning classifiers-Logistic Regression with Lasso regularization, Naïve Bayes, and FasterRisk-were developed and evaluated using 10-fold cross-validation (CV), 50 random 80%-20% splits, leave-one-out CV, and a test data set. The best-performing model for each outcome was used to construct a respective nomogram.</p><p><strong>Results: </strong>A total of 316 limb-sparing eSTS resections were analyzed, predominantly located in the thigh (54%), lower leg (17%), and upper arm (11%). Postoperative outcomes included SSI (12%), WD (16%), seroma formation (8.5%), minor complications (34%), and major complications (25%). Logistic Regression with Lasso regularization consistently outperformed the other models across all outcomes, achieving area under the receiver operator curves ranging from 0.83 to 0.93 in all tests.</p><p><strong>Conclusion: </strong>By providing probabilistic estimates of adverse outcomes on the basis of reconstructive modality, SARCON empowers surgeons to anticipate complications and optimize reconstructive strategies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500007"},"PeriodicalIF":3.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor Lee, Nicholas S Moore, Joshua Doyle, Daniel Hicks, Patrick Oh, Shari Bodofsky, Sajid Hossain, Abhijit A Patel, Sanjay Aneja, Robert Homer, Henry S Park
{"title":"Prediction of Lymph Node Metastasis in Non-Small Cell Lung Carcinoma Using Primary Tumor Somatic Mutation Data.","authors":"Victor Lee, Nicholas S Moore, Joshua Doyle, Daniel Hicks, Patrick Oh, Shari Bodofsky, Sajid Hossain, Abhijit A Patel, Sanjay Aneja, Robert Homer, Henry S Park","doi":"10.1200/CCI-24-00303","DOIUrl":"https://doi.org/10.1200/CCI-24-00303","url":null,"abstract":"<p><strong>Purpose: </strong>Lymph node metastasis (LNM) significantly affects prognosis and treatment strategies in non-small cell lung cancer (NSCLC). Current diagnostic methods, including imaging and histopathology, have limited sensitivity and specificity. This study aims to develop and evaluate machine learning (ML) models that predict LNM in NSCLC using single-nucleotide polymorphism (SNP) data from The Cancer Genome Atlas.</p><p><strong>Methods: </strong>A cohort of 542 patients with NSCLC with comprehensive SNP data were analyzed. After preprocessing, feature selection was performed using chi-square tests to identify SNPs significantly associated with LNM. Twelve ML models, including Logistic Regression, Naive Bayes, and Support Vector Machines, were trained and evaluated using bootstrapped data sets. Model performance was assessed using metrics such as accuracy, area under the receiver operating characteristic curve (AUC), and F1 score. Shapley additive explanations values were used for feature interpretability, and survival analysis was conducted to assess clinical outcomes.</p><p><strong>Results: </strong>Naive Bayes and Logistic Regression models achieved the highest predictive performance, with median AUCs of 0.93 and 0.91, respectively. Key SNPs, including mutations in <i>TANC2</i>, <i>KCNT2</i>, and <i>CENPF</i>, were consistently identified as predictive features. Survival analysis demonstrated significant differences in outcomes on the basis of model-predicted LNM status (log-rank <i>P</i> = .0268). Feature selection improved model accuracy and robustness, highlighting the biological relevance of selected SNPs.</p><p><strong>Conclusion: </strong>ML models leveraging primary tumor SNP data can enhance LNM prediction in NSCLC, outperforming traditional diagnostic methods. These findings underscore the potential of integrating genomics and ML to develop noninvasive biomarkers, enabling precise risk stratification and personalized treatment strategies in oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400303"},"PeriodicalIF":3.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack Gallifant, Shan Chen, Sandeep K Jain, Pedro Moreira, Umit Topaloglu, Hugo J W L Aerts, Jeremy L Warner, William G La Cava, Danielle S Bitterman
{"title":"Reliability of Large Language Model Knowledge Across Brand and Generic Cancer Drug Names.","authors":"Jack Gallifant, Shan Chen, Sandeep K Jain, Pedro Moreira, Umit Topaloglu, Hugo J W L Aerts, Jeremy L Warner, William G La Cava, Danielle S Bitterman","doi":"10.1200/CCI-24-00257","DOIUrl":"https://doi.org/10.1200/CCI-24-00257","url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate the performance and consistency of large language models (LLMs) across brand and generic oncology drug names in various clinical tasks, addressing concerns about potential fluctuations in LLM performance because of subtle phrasing differences that could affect patient care.</p><p><strong>Methods: </strong>This study evaluated three LLMs (GPT-3.5-turbo-0125, GPT-4-turbo, and GPT-4o) using drug names from HemOnc ontology. The assessment included 367 generic-to-brand and 2,516 brand-to-generic pairs, 1,000 drug-drug interaction (DDI) synthetic patient cases, and 2,438 immune-related adverse event (irAE) cases. LLMs were tested on drug name recognition, word association, DDI (DDI) detection, and irAE diagnosis using both brand and generic drug names.</p><p><strong>Results: </strong>LLMs demonstrated high accuracy in matching brand and generic names (GPT-4o: 97.38% for brand, 94.71% for generic, <i>P</i> < .01). However, they showed significant inconsistencies in word association tasks. GPT-3.5-turbo-0125 exhibited biases favoring brand names for effectiveness (odds ratio [OR], 1.43, <i>P</i> < .05) and being side-effect-free (OR, 1.76, <i>P</i> < .05). DDI detection accuracy was poor across all models (<26%), with no significant differences between brand and generic names. Sentiment analysis revealed significant differences, particularly in GPT-3.5-turbo-0125 (brand mean 0.67, generic mean 0.95, <i>P</i> < .01). Consistency in irAE diagnosis varied across models.</p><p><strong>Conclusion: </strong>Despite high proficiency in name-matching, LLMs exhibit inconsistencies when processing brand versus generic drug names in more complex tasks. These findings highlight the need for increased awareness, improved robustness assessment methods, and the development of more consistent systems for handling nomenclature variations in clinical applications of LLMs.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400257"},"PeriodicalIF":3.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144310800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reply to: Machine-Learning Algorithms and Treatment Response in Advanced Melanoma.","authors":"Richard Brohet, Jan Willem de Groot","doi":"10.1200/CCI-25-00128","DOIUrl":"https://doi.org/10.1200/CCI-25-00128","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500128"},"PeriodicalIF":3.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144509321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Katerina C Alfaro, Christine S Shusted, Teresa Giamboy, Gregory C Kane, Nathaniel R Evans, Brooke M Ruane, Eboni Gatson-Anderson, Emily Muse, Mary McMullen, Anne Marie Kinsey, Sandra Murray, Christopher McNair, Julie A Barta
{"title":"Using an Integrated, Digital Framework to Standardize and Expand a Multisite Lung Cancer Screening Program.","authors":"Maria Katerina C Alfaro, Christine S Shusted, Teresa Giamboy, Gregory C Kane, Nathaniel R Evans, Brooke M Ruane, Eboni Gatson-Anderson, Emily Muse, Mary McMullen, Anne Marie Kinsey, Sandra Murray, Christopher McNair, Julie A Barta","doi":"10.1200/CCI-24-00322","DOIUrl":"10.1200/CCI-24-00322","url":null,"abstract":"<p><strong>Purpose: </strong>Lung cancer screening (LCS) is one of the most potentially impactful interventions of the past two decades for reducing lung cancer mortality. However, no current standard exists in the field for comprehensive data collection and tracking of LCS, despite availability of electronic health records (EHRs) and LCS management tools. In a widely expanding LCS program, harmonization of data becomes critical for decisions surrounding clinical care coordination and operational management.</p><p><strong>Methods: </strong>This article summarizes the implementation of an integrated, digital framework within the Jefferson Health System using the Epic EHR and its customized SmartForms as well as Research Electronic Data Capture application. Leveraging these tools has allowed for standardized documentation across the LCS process continuum for each patient: LCS eligibility, shared decision making, low-dose computed tomography, and follow-up.</p><p><strong>Results: </strong>Since the initial rollout in October 2022, 11 program sites across four regional hubs have adopted this framework. A standardized process paired with interoperability between systems has resulted in a centralized data repository, increased communication and transparency within and between program sites, and decreased duplicative or manual processes across the entire LCS program.</p><p><strong>Conclusion: </strong>The resultant digital framework is poised for scale-up and sustainment across the Jefferson Health System, and it can also be replicated across other LCS programs. Future iterations of the current work or adoption by other programs should take into account the complexities of the EHR itself and data provenance to ensure success. Active participation among stakeholders for synchronous coordination of building, implementing, and troubleshooting a comprehensive repository for LCS data can ultimately facilitate measurement of quality metrics and develop future research in early detection of lung cancer.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400322"},"PeriodicalIF":3.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeremy Louissaint, Beverly Kyalwazi, John Deng, Timothy P Hogan, Robert W Turer, Elliot B Tapper, David E Gerber, Bryan D Steitz, Sarah R Lieber, Amit G Singal
{"title":"Timing and Method of Patient-Provider Communication for Abnormal Hepatocellular Carcinoma Screening Results in Cirrhosis.","authors":"Jeremy Louissaint, Beverly Kyalwazi, John Deng, Timothy P Hogan, Robert W Turer, Elliot B Tapper, David E Gerber, Bryan D Steitz, Sarah R Lieber, Amit G Singal","doi":"10.1200/CCI-24-00269","DOIUrl":"https://doi.org/10.1200/CCI-24-00269","url":null,"abstract":"<p><strong>Purpose: </strong>Patients with cirrhosis undergo frequent abdominal imaging including semiannual hepatocellular carcinoma (HCC) screening, with results released immediately via the patient portal. We characterized time from patient review to patient-provider communication (PPC) for patients with abnormal liver imaging results.</p><p><strong>Methods: </strong>We identified patients with cirrhosis enrolled in the patient portal with a new abnormal liver lesion (LI-RADS, LR) on ambulatory liver ultrasound (US) or multiphasic computed tomography/magnetic resonance imaging. Imaging findings were grouped into low-risk (US-2, LR-2), intermediate-risk (US-3, LR-3), and high-risk (LR-4, LR-5, LR-M, LR-TIV) results. We extracted three date-time events from the electronic health record, including result release to the patient, patient review of the result, and result-related PPC. We compared communication methods and the median time with PPC after patient review of results between groups.</p><p><strong>Results: </strong>The cohort included 133 patients (median age, 62 years, 56% male) with 34 (25.6%) low-risk, 61 (45.9%) intermediate-risk, and 38 (28.6%) high-risk results. PPC for high-risk results was predominantly via telephone calls (60.5%), whereas portal messages were most commonly used for low- and intermediate-risk results (61.8% and 45.9%, respectively; <i>P</i> < .001). For patients who reviewed their result on the portal, most (79.3%) reviewed the result before PPC, among whom the median time between review and PPC was 55.8 (IQR, 22.0-219.0), 167 (IQR, 42.7-324.0), and 47.3 (IQR, 25.8-78.8) hours for low-, intermediate-, and high-risk results, respectively (<i>P</i> = .02).</p><p><strong>Conclusion: </strong>Portal-based review of abnormal imaging results by patients before provider communication is common, including results concerning a new HCC diagnosis. Further studies are needed to evaluate patient-reported outcomes, such as psychological distress, associated with this method of disclosing cancer-related results.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400269"},"PeriodicalIF":3.3,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144052306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelly Merriman, Emily Yu, Andrea Hawkins-Daarud, Kerin Adelson, Kenna Shaw, Dan Shoenthal, Jay Patel, Janna Baganz, Andy Futreal, David A Jaffray, Jose Rivera, Caroline Chung
{"title":"Data Events Are Safety Events: High-Reliability Organization Approach to Improving Data Quality and Safety.","authors":"Kelly Merriman, Emily Yu, Andrea Hawkins-Daarud, Kerin Adelson, Kenna Shaw, Dan Shoenthal, Jay Patel, Janna Baganz, Andy Futreal, David A Jaffray, Jose Rivera, Caroline Chung","doi":"10.1200/CCI-24-00273","DOIUrl":"https://doi.org/10.1200/CCI-24-00273","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400273"},"PeriodicalIF":3.3,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12058365/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144057168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariya Lysenkova Wiklander, Dave Zachariah, Olga Krali, Jessica Nordlund
{"title":"Error Reduction in Leukemia Machine Learning Classification With Conformal Prediction.","authors":"Mariya Lysenkova Wiklander, Dave Zachariah, Olga Krali, Jessica Nordlund","doi":"10.1200/CCI-24-00324","DOIUrl":"10.1200/CCI-24-00324","url":null,"abstract":"<p><strong>Purpose: </strong>Recent advances in machine learning have led to the development of classifiers that predict molecular subtypes of acute lymphoblastic leukemia (ALL) using RNA-sequencing (RNA-seq) data. Although these models have shown promising results, they often lack robust performance guarantees. The aim of this study was three-fold: to quantify the uncertainty of these classifiers, to provide prediction sets that control the false-negative rate (FNR), and to perform implicit error reduction by transforming incorrect predictions into uncertain predictions.</p><p><strong>Methods: </strong>Conformal prediction (CP) is a distribution-agnostic framework for generating statistically calibrated prediction sets whose size reflects model uncertainty. In this study, we applied an extension called conformal risk control to three RNA-seq ALL subtype classifiers. Leveraging RNA-seq data from 1,227 patient samples taken at diagnosis, we developed a multiclass conformal predictor ALLCoP, which generates statistically guaranteed FNR-controlled prediction sets.</p><p><strong>Results: </strong>ALLCoP was able to create prediction sets with specified FNR tolerances ranging from 7.5% to 30%. In a validation cohort, ALLCoP successfully reduced the FNR of the ALLIUM RNA-seq ALL subtype classifier from 8.95% to 3.5%. For patients whose subtype was not previously known, the use of ALLCoP was able to reduce the occurrence of empty predictions from 37% to 17%. Notably, up to 34% of the multiple-class prediction sets included the <i>PAX5</i>alt subtype, suggesting that increased prediction set size may reflect secondary aberrations and biological complexity, contributing to classifier uncertainty. Finally, ALLCoP was validated on two additional RNA-seq ALL subtype classifiers, ALLSorts and ALLCatchR.</p><p><strong>Conclusion: </strong>Our results highlight the potential of CP in enhancing the use of oncologic RNA-seq subtyping classifiers and also in uncovering additional molecular aberrations of potential clinical importance.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400324"},"PeriodicalIF":3.3,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Strategy for Lung Cancer Screening: From Risk Prediction to Clinical Decision Support.","authors":"Hao Dai, Yu Huang, Xing He, Tiancheng Zhou, Yuxi Liu, Xuhong Zhang, Yi Guo, Jingchuan Guo, Jiang Bian","doi":"10.1200/CCI-24-00291","DOIUrl":"https://doi.org/10.1200/CCI-24-00291","url":null,"abstract":"<p><strong>Purpose: </strong>Low-dose computed tomography (LDCT) screening is effective in reducing lung cancer mortality by detecting the disease at earlier, more treatable stages. However, high false-positive rates and the associated risks of subsequent invasive diagnostic procedures present significant challenges. This study proposes an advanced pipeline that integrates machine learning (ML) and causal inference techniques to optimize lung cancer screening decisions.</p><p><strong>Materials and methods: </strong>Using real-world data from the OneFlorida+ Clinical Research Consortium, we developed ML models to predict individual lung cancer risk and estimate the benefits of LDCT screening. Explainable artificial intelligence techniques were applied to identify key risk factors, ensuring transparency and trust in the model's predictions. Causal ML methods were used to estimate individualized treatment effects of LDCT screening, answering the critical what-if question regarding risk reduction from LDCT.</p><p><strong>Results: </strong>We defined a high-risk cohort of 5,947 patients who underwent LDCT, along with matched controls, to evaluate the framework. Our models demonstrated predictive performance with AUCs of 0.777 and 0.793 for 1-year and 3-year risk predictions, respectively. Causal modeling showed a consistent reduction in lung cancer risk across different subgroups due to LDCT. Specifically, the doubly robust model showed an average risk reduction of 9.5% for males and 12% for females. Age-stratified results indicated a 9.5% reduction for individuals age 50-60 years, a 7.5% reduction for those age 60-70 years, and the largest reduction of 15.1% for the 70-80 age group.</p><p><strong>Conclusion: </strong>Integrating ML and causal inference into clinical workflows offers a robust tool for enhancing lung cancer screening. This pipeline provides accurate risk assessments and actionable insights tailored to individuals, empowering clinicians and patients to make informed screening decisions. The differential risk reduction across subgroups highlights the importance of personalized screening in improving outcomes for populations at risk of lung cancer.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400291"},"PeriodicalIF":3.3,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12061033/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144057452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adel Shahnam, Udit Nindra, Nadia Hitchen, Joanne Tang, Martin Hong, Jun Hee Hong, George Au-Yeung, Wei Chua, Weng Ng, Ashley M Hopkins, Michael J Sorich
{"title":"Application of Generative Artificial Intelligence for Physician and Patient Oncology Letters-AI-OncLetters.","authors":"Adel Shahnam, Udit Nindra, Nadia Hitchen, Joanne Tang, Martin Hong, Jun Hee Hong, George Au-Yeung, Wei Chua, Weng Ng, Ashley M Hopkins, Michael J Sorich","doi":"10.1200/CCI-24-00323","DOIUrl":"https://doi.org/10.1200/CCI-24-00323","url":null,"abstract":"<p><strong>Purpose: </strong>Although large language models (LLMs) are increasingly used in clinical practice, formal assessments of their quality, accuracy, and effectiveness in medical oncology remain limited. We aimed to evaluate the ability of ChatGPT, an LLM, to generate physician and patient letters from clinical case notes.</p><p><strong>Methods: </strong>Six oncologists created 29 (four training, 25 final) synthetic oncology case notes. Structured prompts for ChatGPT were iteratively developed using the four training cases; once finalized, 25 physician-directed and patient-directed letters were generated. These underwent evaluation by expert consumers and oncologists for accuracy, relevance, and readability using Likert scales. The patient letters were also assessed with the Patient Education Materials Assessment Tool for Print (PEMAT-P), Flesch Reading Ease, and Simple Measure of Gobbledygook index.</p><p><strong>Results: </strong>Among physician-to-physician letters, 95% (119/125) of oncologists agreed they were accurate, comprehensive, and relevant, with no safety concerns noted. These letters demonstrated precise documentation of history, investigations, and treatment plans and were logically and concisely structured. Patient-directed letters achieved a mean Flesch Reading Ease score of 73.3 (seventh-grade reading level) and a PEMAT-P score above 80%, indicating high understandability. Consumer reviewers found them clear and appropriate for patient communication. Some omissions of details (eg, side effects), stylistic inconsistencies, and repetitive phrasing were identified, although no clinical safety issues emerged. Seventy-two percent (90/125) of consumers expressed willingness to receive artificial intelligence (AI)-generated patient letters.</p><p><strong>Conclusion: </strong>ChatGPT, when guided by structured prompts, can generate high-quality letters that align with clinical and patient communication standards. No clinical safety concerns were identified, although addressing occasional omissions and improving natural language flow could enhance their utility in practice. Further studies comparing AI-generated and human-written letters are recommended.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400323"},"PeriodicalIF":3.3,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144032285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}