A.D. Coles , C.D. McInerney , K. Zucker , S. Cheeseman , O.A. Johnson , G. Hall
{"title":"Evaluation of machine learning methods for the retrospective detection of ovarian cancer recurrences from chemotherapy data","authors":"A.D. Coles , C.D. McInerney , K. Zucker , S. Cheeseman , O.A. Johnson , G. Hall","doi":"10.1016/j.esmorw.2024.100038","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Cancer recurrences are poorly recorded within electronic health records around the world. This hinders research into the efficacy of cancer treatments. Currently, the retrospective identification of recurrence/progression diagnosis dates is achieved by staff who manually review patients’ health records. This is expensive, time-consuming, and inefficient. Machine Learning models may expedite the review of health records and facilitate the assessment of alternative cancer therapies.</p></div><div><h3>Materials and methods</h3><p>This paper evaluates the use of four machine learning models (random forests, conditional inference trees, decision trees, and logistic regression) in identifying proxy dates of epithelial ovarian cancer recurrence/progression from chemotherapy data, in 531 patients at Leeds Teaching Hospital Trust.</p></div><div><h3>Results</h3><p>The random forest achieved the highest F1 score of 0.941 (95% confidence interval 0.916-0.968) when identifying recurrence events. Both the random forest and decision tree models’ classifications closely conform to chart-reviewed time to next treatment, serving as a surrogate for recurrence-free survival. Additionally, all models reached an F1 score >0.940 when identifying patients whose cancer recurred/progressed.</p></div><div><h3>Conclusions</h3><p>Our models proficiently identify both proxy dates for recurrence/progression diagnoses and patients whose cancer recurred/progressed. Considering the similar performance of the random forest and decision tree, model preference should be determined by the interpretability required to assist chart review and the ease of implementation into existing architecture.</p></div>","PeriodicalId":100491,"journal":{"name":"ESMO Real World Data and Digital Oncology","volume":"4 ","pages":"Article 100038"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294982012400016X/pdfft?md5=037748083c08b03abbc66eb0cbc15421&pid=1-s2.0-S294982012400016X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESMO Real World Data and Digital Oncology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294982012400016X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Cancer recurrences are poorly recorded within electronic health records around the world. This hinders research into the efficacy of cancer treatments. Currently, the retrospective identification of recurrence/progression diagnosis dates is achieved by staff who manually review patients’ health records. This is expensive, time-consuming, and inefficient. Machine Learning models may expedite the review of health records and facilitate the assessment of alternative cancer therapies.
Materials and methods
This paper evaluates the use of four machine learning models (random forests, conditional inference trees, decision trees, and logistic regression) in identifying proxy dates of epithelial ovarian cancer recurrence/progression from chemotherapy data, in 531 patients at Leeds Teaching Hospital Trust.
Results
The random forest achieved the highest F1 score of 0.941 (95% confidence interval 0.916-0.968) when identifying recurrence events. Both the random forest and decision tree models’ classifications closely conform to chart-reviewed time to next treatment, serving as a surrogate for recurrence-free survival. Additionally, all models reached an F1 score >0.940 when identifying patients whose cancer recurred/progressed.
Conclusions
Our models proficiently identify both proxy dates for recurrence/progression diagnoses and patients whose cancer recurred/progressed. Considering the similar performance of the random forest and decision tree, model preference should be determined by the interpretability required to assist chart review and the ease of implementation into existing architecture.