Jaimie J Lee, Andres Zepeda, Gregory Arbour, Kathryn V Isaac, Raymond T Ng, Alan M Nichol
{"title":"Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.","authors":"Jaimie J Lee, Andres Zepeda, Gregory Arbour, Kathryn V Isaac, Raymond T Ng, Alan M Nichol","doi":"10.1200/CCI.24.00107","DOIUrl":"10.1200/CCI.24.00107","url":null,"abstract":"<p><strong>Purpose: </strong>Breast cancer relapses are rarely collected by cancer registries because of logistical and financial constraints. Hence, we investigated natural language processing (NLP), enhanced with state-of-the-art deep learning transformer tools and large language models, to automate relapse identification in the text of computed tomography (CT) reports.</p><p><strong>Methods: </strong>We analyzed follow-up CT reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014. The reports were curated and annotated for the presence or absence of local, regional, and distant breast cancer relapses. We performed 10-fold cross-validation to evaluate models identifying different types of relapses in CT reports. Model performance was assessed with classification metrics, reported with 95% confidence intervals.</p><p><strong>Results: </strong>In our data set of 1,445 CT reports, 799 (55.3%) described any relapse, 72 (5.0%) local relapses, 97 (6.7%) regional relapses, and 743 (51.4%) distant relapses. The any-relapse model achieved an accuracy of 89.6% (87.8-91.1), with a sensitivity of 93.2% (91.4-94.9) and a specificity of 84.2% (80.9-87.1). The local relapse model achieved an accuracy of 94.6% (93.3-95.7), a sensitivity of 44.4% (32.8-56.3), and a specificity of 97.2% (96.2-98.0). The regional relapse model showed an accuracy of 93.6% (92.3-94.9), a sensitivity of 70.1% (60.0-79.1), and a specificity of 95.3% (94.2-96.5). Finally, the distant relapse model demonstrated an accuracy of 88.1% (86.2-89.7), a sensitivity of 91.8% (89.9-93.8), and a specificity of 83.7% (80.5-86.4).</p><p><strong>Conclusion: </strong>We developed NLP models to identify local, regional, and distant breast cancer relapses from CT reports. Automating the identification of breast cancer relapses can enhance data collection about patient outcomes.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400107"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachelle Swart, Liesbeth Boersma, Rianne Fijten, Wouter van Elmpt, Paul Cremers, Maria J G Jacobs
{"title":"Implementation Strategy for Artificial Intelligence in Radiotherapy: Can Implementation Science Help?","authors":"Rachelle Swart, Liesbeth Boersma, Rianne Fijten, Wouter van Elmpt, Paul Cremers, Maria J G Jacobs","doi":"10.1200/CCI.24.00101","DOIUrl":"10.1200/CCI.24.00101","url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence (AI) applications in radiotherapy (RT) are expected to save time and improve quality, but implementation remains limited. Therefore, we used implementation science to develop a format for designing an implementation strategy for AI. This study aimed to (1) apply this format to develop an AI implementation strategy for our center; (2) identify insights gained to enhance AI implementation using this format; and (3) assess the feasibility and acceptability of this format to design a center-specific implementation strategy for departments aiming to implement AI.</p><p><strong>Methods: </strong>We created an AI-implementation strategy for our own center using implementation science methods. This included a stakeholder analysis, literature review, and interviews to identify facilitators and barriers, and designed strategies to overcome the barriers. These methods were subsequently used in a workshop with teams from seven Dutch RT centers to develop their own AI-implementation plans. The applicability, appropriateness, and feasibility were evaluated by the workshop participants, and relevant insights for AI implementation were summarized.</p><p><strong>Results: </strong>The stakeholder analysis identified internal (physicians, physicists, RT technicians, information technology, and education) and external (patients and representatives) stakeholders. Barriers and facilitators included concerns about opacity, privacy, data quality, legal aspects, knowledge, trust, stakeholder involvement, ethics, and multidisciplinary collaboration, all integrated into our implementation strategy. The workshop evaluation showed high acceptability (18 participants [90%]), appropriateness (17 participants [85%]), and feasibility (15 participants [75%]) of the implementation strategy. Sixteen participants fully agreed with the format.</p><p><strong>Conclusion: </strong>Our study highlights the need for a collaborative approach to implement AI in RT. We designed a strategy to overcome organizational challenges, improve AI integration, and enhance patient care. Workshop feedback indicates the proposed methods are useful for multiple RT centers. Insights gained by applying the methods highlight the importance of multidisciplinary collaboration in the development and implementation of AI.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400101"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen
{"title":"Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score.","authors":"Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen","doi":"10.1200/CCI.24.00123","DOIUrl":"10.1200/CCI.24.00123","url":null,"abstract":"<p><strong>Purpose: </strong>Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.</p><p><strong>Methods: </strong>We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.</p><p><strong>Results: </strong>Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% <i>v</i> 60%; <i>P</i> < .01) and fewer hallucinations (34% <i>v</i> 53%; <i>P</i> < .01). GPT-4's overall G-PS was significantly higher (0.34 <i>v</i> -0.15; <i>P</i> < .01), indicating superior performance.</p><p><strong>Conclusion: </strong>This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400123"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11634130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Ching Chen, Travis Zack, Arda Demirci, Madhumita Sushil, Brenda Miao, Corynn Kasap, Atul Butte, Eric A Collisson, Julian C Hong
{"title":"Assessing Large Language Models for Oncology Data Inference From Radiology Reports.","authors":"Li-Ching Chen, Travis Zack, Arda Demirci, Madhumita Sushil, Brenda Miao, Corynn Kasap, Atul Butte, Eric A Collisson, Julian C Hong","doi":"10.1200/CCI.24.00126","DOIUrl":"https://doi.org/10.1200/CCI.24.00126","url":null,"abstract":"<p><strong>Purpose: </strong>We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports.</p><p><strong>Methods: </strong>We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist.</p><p><strong>Results: </strong>Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment.</p><p><strong>Conclusion: </strong>LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400126"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of Hepatocellular Carcinoma After Hepatitis C Virus Sustained Virologic Response Using a Random Survival Forest Model.","authors":"Hikaru Nakahara, Atsushi Ono, C Nelson Hayes, Yuki Shirane, Ryoichi Miura, Yasutoshi Fujii, Serami Murakami, Kenji Yamaoka, Hauri Bao, Shinsuke Uchikawa, Hatsue Fujino, Eisuke Murakami, Tomokazu Kawaoka, Daiki Miki, Masataka Tsuge, Shiro Oka","doi":"10.1200/CCI.24.00108","DOIUrl":"https://doi.org/10.1200/CCI.24.00108","url":null,"abstract":"<p><strong>Purpose: </strong>Postsustained virologic response (SVR) screening following clinical guidelines does not address individual risk of hepatocellular carcinoma (HCC). Our aim is to provide tailored screening for patients using machine learning to predict HCC incidence after SVR.</p><p><strong>Methods: </strong>Using clinical data from 1,028 SVR patients, we developed an HCC prediction model using a random survival forest (RSF). Model performance was assessed using Harrel's c-index and validated in an independent cohort of 737 SVR patients. Shapley additive explanation (SHAP) facilitated feature quantification, whereas optimal cutoffs were determined using maximally selected rank statistics. We used Kaplan-Meier analysis to compare cumulative HCC incidence between risk groups.</p><p><strong>Results: </strong>We achieved c-index scores and 95% CIs of 0.90 (0.85 to 0.94) and 0.80 (0.74 to 0.85) in the derivation and validation cohorts, respectively, in a model using platelet count, gamma-glutamyl transpeptidase, sex, age, and ALT. Stratification resulted in four risk groups: low, intermediate, high, and very high. The 5-year cumulative HCC incidence rates and 95% CIs for these groups were as follows: derivation: 0% (0 to 0), 3.8% (0.6 to 6.8), 26.2% (17.2 to 34.3), and 54.2% (20.2 to 73.7), respectively, and validation: 0.7% (0 to 1.6), 7.1% (2.7 to 11.3), 5.2% (0 to 10.8), and 28.6% (0 to 55.3), respectively.</p><p><strong>Conclusion: </strong>The integration of RSF and SHAP enabled accurate HCC risk classification after SVR, which may facilitate individualized HCC screening strategies and more cost-effective care.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400108"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gurjyot K Doshi, Andrew J Osterland, Ping Shi, Annette Yim, Viviana Del Tejo, Sarah B Guttenplan, Samantha Eiffert, Xin Yin, Lisa Rosenblatt, Paul R Conkling
{"title":"Real-World Outcomes in Patients With Metastatic Renal Cell Carcinoma Treated With First-Line Nivolumab Plus Ipilimumab in the United States.","authors":"Gurjyot K Doshi, Andrew J Osterland, Ping Shi, Annette Yim, Viviana Del Tejo, Sarah B Guttenplan, Samantha Eiffert, Xin Yin, Lisa Rosenblatt, Paul R Conkling","doi":"10.1200/CCI.24.00132","DOIUrl":"10.1200/CCI.24.00132","url":null,"abstract":"<p><strong>Purpose: </strong>Nivolumab plus ipilimumab (NIVO + IPI) is a first-in-class combination immunotherapy for the treatment of intermediate- or poor (I/P)-risk advanced or metastatic renal cell carcinoma (mRCC). Currently, there are limited real-world data regarding clinical effectiveness beyond 12-24 months from treatment initiation. In this real-world study, treatment patterns and clinical outcomes were evaluated for NIVO + IPI in a community oncology setting.</p><p><strong>Methods: </strong>A retrospective analysis using electronic medical record data from The US Oncology Network examined patients with I/P-risk clear cell mRCC who initiated first-line (1L) NIVO + IPI between January 4, 2018, and December 31, 2019, with follow-up until June 30, 2022. Baseline demographics, clinical characteristics, treatment patterns, clinical effectiveness, and safety outcomes were assessed descriptively. Overall survival (OS) and real-world progression-free survival (rwPFS) were analyzed using Kaplan-Meier methods.</p><p><strong>Results: </strong>Among 187 patients identified (median follow-up, 22.4 months), with median age 63 (range, 30-89) years, 74 (39.6%) patients had poor risk and 37 (19.8%) patients had Eastern Cooperative Oncology Group performance status score ≥2. Of 86 patients who received second-line therapy, 54.7% received cabozantinib and 10.5% received pazopanib. The median (95% CI) OS and rwPFS were 38.4 (24.7-46.1) months and 11.1 (7.5-15.0) months, respectively. Treatment-related adverse events (TRAEs) were reported in 89 (47.6%) patients, including fatigue (n = 25, 13.4%) and rash (n = 19, 10.2%).</p><p><strong>Conclusion: </strong>This study provides data to support the understanding of the real-world utilization and long-term effectiveness of 1L NIVO + IPI in patients with I/P-risk mRCC. TRAE rates were low relative to clinical trials.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400132"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bradley D McDowell, Michael A O'Rorke, Mary C Schroeder, Elizabeth A Chrischilles, Christine M Spinka, Lemuel R Waitman, Kelechi Anuforo, Alejandro Araya, Haddyjatou Bah, Jackson Barlocker, Sravani Chandaka, Lindsay G Cowell, Carol R Geary, Snehil Gupta, Benjamin D Horne, Boyd M Knosp, Albert M Lai, Vasanthi Mandhadi, Abu Saleh Mohammad Mosa, Phillip Reeder, Giyung Ryu, Brian Shukwit, Claire Smith, Alexander J Stoddard, Mahanazuddin Syed, Shorabuddin Syed, Bradley W Taylor, Jeffrey J VanWormer
{"title":"Implementing Cancer Registry Data With the PCORnet Common Data Model: The Greater Plains Collaborative Experience.","authors":"Bradley D McDowell, Michael A O'Rorke, Mary C Schroeder, Elizabeth A Chrischilles, Christine M Spinka, Lemuel R Waitman, Kelechi Anuforo, Alejandro Araya, Haddyjatou Bah, Jackson Barlocker, Sravani Chandaka, Lindsay G Cowell, Carol R Geary, Snehil Gupta, Benjamin D Horne, Boyd M Knosp, Albert M Lai, Vasanthi Mandhadi, Abu Saleh Mohammad Mosa, Phillip Reeder, Giyung Ryu, Brian Shukwit, Claire Smith, Alexander J Stoddard, Mahanazuddin Syed, Shorabuddin Syed, Bradley W Taylor, Jeffrey J VanWormer","doi":"10.1200/CCI-24-00196","DOIUrl":"10.1200/CCI-24-00196","url":null,"abstract":"<p><strong>Purpose: </strong>Electronic health records (EHRs) comprise a rich source of real-world data for cancer studies, but they often lack critical structured data elements such as diagnosis date and disease stage. Fortunately, such concepts are available from hospital cancer registries. We describe experiences from integrating cancer registry data with EHR and billing data in an interoperable data model across a multisite clinical research network.</p><p><strong>Methods: </strong>After sites implemented cancer registry data into a tumor table compatible with the PCORnet Common Data Model (CDM), distributed queries were performed to assess quality issues. After remediation of quality issues, another query produced descriptive frequencies of cancer types and demographic characteristics. This included linked BMI. We also report two current use cases of the new resource.</p><p><strong>Results: </strong>Eleven sites implemented the tumor table, yielding a resource with data for 572,902 tumors. Institutional and technical barriers were surmounted to accomplish this. Variations in racial and ethnic distributions across the sites were observed; the percent of tumors among Black patients ranged from <1% to 15% across sites, and the percent of tumors among Hispanic patients ranged from 1% to 46% across sites. Current use cases include a pragmatic prospective cohort study of a rare cancer and a retrospective cohort study leveraging body size and chemotherapy dosing.</p><p><strong>Conclusion: </strong>Integrating cancer registry data with the PCORnet CDM across multiple institutions creates a powerful resource for cancer studies. It provides a wider array of structured, cancer-relevant concepts, and it allows investigators to examine variability in those concepts across many treatment environments. Having the CDM tumor table in place enhances the impact of the network's effectiveness for real-world cancer research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400196"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul Windisch, Fabio Dennstädt, Carole Koechli, Robert Förster, Christina Schröder, Daniel M Aebersold, Daniel R Zwahlen
{"title":"Metastatic Versus Localized Disease as Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing.","authors":"Paul Windisch, Fabio Dennstädt, Carole Koechli, Robert Förster, Christina Schröder, Daniel M Aebersold, Daniel R Zwahlen","doi":"10.1200/CCI-24-00150","DOIUrl":"https://doi.org/10.1200/CCI-24-00150","url":null,"abstract":"<p><strong>Purpose: </strong>Extracting inclusion and exclusion criteria in a structured, automated fashion remains a challenge to developing better search functionalities or automating systematic reviews of randomized controlled trials in oncology. The question \"Did this trial enroll patients with localized disease, metastatic disease, or both?\" could be used to narrow down the number of potentially relevant trials when conducting a search.</p><p><strong>Methods: </strong>Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. Five hundred trials were used to develop and validate three different models, with 100 trials being stored away for testing. The test set was also used to evaluate the performance of GPT-4o in the same task.</p><p><strong>Results: </strong>In the test set, a rule-based system using regular expressions achieved F1 scores of 0.72 for the prediction of whether the trial allowed for the inclusion of patients with localized disease and 0.77 for metastatic disease. A transformer-based machine learning (ML) model achieved F1 scores of 0.97 and 0.88, respectively. A combined approach where the rule-based system was allowed to over-rule the ML model achieved F1 scores of 0.97 and 0.89, respectively. GPT-4o achieved F1 scores of 0.87 and 0.92, respectively.</p><p><strong>Conclusion: </strong>Automatic classification of cancer trials with regard to the inclusion of patients with localized and/or metastatic disease is feasible. Turning the extraction of trial criteria into classification problems could, in selected cases, improve text-mining approaches in evidence-based medicine. Increasingly large language models can reduce or eliminate the need for previous training on the task at the expense of increased computational power and, in turn, cost.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400150"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dorian Culié, Renaud Schiappa, Sara Contu, Eva Seutin, Tanguy Pace-Loscos, Gilles Poissonnet, Agathe Villarme, Alexandre Bozec, Emmanuel Chamorey
{"title":"Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.","authors":"Dorian Culié, Renaud Schiappa, Sara Contu, Eva Seutin, Tanguy Pace-Loscos, Gilles Poissonnet, Agathe Villarme, Alexandre Bozec, Emmanuel Chamorey","doi":"10.1200/CCI.23.00263","DOIUrl":"https://doi.org/10.1200/CCI.23.00263","url":null,"abstract":"<p><strong>Purpose: </strong>Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.</p><p><strong>Materials and methods: </strong>We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.</p><p><strong>Results: </strong>Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.</p><p><strong>Conclusion: </strong>Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300263"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lie Cai, Thomas M Deutsch, Chris Sidey-Gibbons, Michelle Kobel, Fabian Riedel, Katharina Smetanay, Carlo Fremd, Laura Michel, Michael Golatta, Joerg Heil, Andreas Schneeweiss, André Pfob
{"title":"Machine Learning to Predict the Individual Risk of Treatment-Relevant Toxicity for Patients With Breast Cancer Undergoing Neoadjuvant Systemic Treatment.","authors":"Lie Cai, Thomas M Deutsch, Chris Sidey-Gibbons, Michelle Kobel, Fabian Riedel, Katharina Smetanay, Carlo Fremd, Laura Michel, Michael Golatta, Joerg Heil, Andreas Schneeweiss, André Pfob","doi":"10.1200/CCI.24.00010","DOIUrl":"10.1200/CCI.24.00010","url":null,"abstract":"<p><strong>Purpose: </strong>Toxicity to systemic cancer treatment represents a major anxiety for patients and a challenge to treatment plans. We aimed to develop machine learning algorithms for the upfront prediction of an individual's risk of experiencing treatment-relevant toxicity during the course of treatment.</p><p><strong>Methods: </strong>Clinical records were retrieved from a single-center, consecutive cohort of patients who underwent neoadjuvant treatment for early breast cancer. We developed and validated machine learning algorithms to predict grade 3 or 4 toxicity (anemia, neutropenia, deviation of liver enzymes, nephrotoxicity, thrombopenia, electrolyte disturbance, or neuropathy). We used 10-fold cross-validation to develop two algorithms (logistic regression with elastic net penalty [GLM] and support vector machines [SVMs]). Algorithm predictions were compared with documented toxicity events and diagnostic performance was evaluated via area under the curve (AUROC).</p><p><strong>Results: </strong>A total of 590 patients were identified, 432 in the development set and 158 in the validation set. The median age was 51 years, and 55.8% (329 of 590) experienced grade 3 or 4 toxicity. The performance improved significantly when adding referenced treatment information (referenced regimen, referenced summation dose intensity product) in addition to patient and tumor variables: GLM AUROC 0.59 versus 0.75, <i>P</i> = .02; SVM AUROC 0.64 versus 0.75, <i>P</i> = .01.</p><p><strong>Conclusion: </strong>The individual risk of treatment-relevant toxicity can be predicted using machine learning algorithms. We demonstrate a promising way to improve efficacy and facilitate proactive toxicity management of systemic cancer treatment.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400010"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670908/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}