Lauren A Scanlon, Phillip J Monaghan, Safwaan Adam
{"title":"Development of a Machine Learning Algorithm to Predict Abnormalities in Serum Phosphate in a Large Oncology Cohort.","authors":"Lauren A Scanlon, Phillip J Monaghan, Safwaan Adam","doi":"10.1200/CCI-24-00312","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Serum phosphate is commonly measured in oncology patients because of the relationship between oncologic conditions and treatments with abnormal phosphate. All patients attending our institution, a large specialist oncology center, have a standardized order set (SOS) measured. This consists of 15 biochemical tests, including serum phosphate. Our aim was to understand if abnormalities in serum phosphate could be predicted, using a machine learning algorithm (MLA) by other interrelated variables in the SOS.</p><p><strong>Methods: </strong>We trained an XGBoost MLA implemented in Python to predict occurrence of abnormal phosphate (<0.5 or >1.78 mmol/L) from other results in the SOS. To train and test this algorithm, we used 481,150 test results for 45,174 patients on blood tests between January 2019 and December 2021, with 5,897 abnormal results.</p><p><strong>Results: </strong>This model was trained and tested on a 70%/30% split (train/test result cohort), achieving an area under the receiver operator curve on the test set of 0.866 (95% CI, 0.857 to 0.875). Assigning a threshold for predictions so the model achieves a sensitivity of 0.924 and a specificity of 0.530 and only performing a phosphate test for results above this threshold, the number of phosphate tests would be reduced from 142,647 to 67,873 in this test set, capturing 1,586 of the total 1,716 abnormal results with a small risk (<0.1%) of missing an abnormal result. The model was further validated on a separate validation cohort between January 2022 and December 2023, achieving similar levels of performance.</p><p><strong>Conclusion: </strong>A MLA to optimize testing of phosphate has been developed with high sensitivity. Its application in routine care might result in cost-savings and health care efficiencies. The methodology used to develop our MLA model can be applied to other settings where interrelated variables are measured in SOS.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400312"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-24-00312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Serum phosphate is commonly measured in oncology patients because of the relationship between oncologic conditions and treatments with abnormal phosphate. All patients attending our institution, a large specialist oncology center, have a standardized order set (SOS) measured. This consists of 15 biochemical tests, including serum phosphate. Our aim was to understand if abnormalities in serum phosphate could be predicted, using a machine learning algorithm (MLA) by other interrelated variables in the SOS.
Methods: We trained an XGBoost MLA implemented in Python to predict occurrence of abnormal phosphate (<0.5 or >1.78 mmol/L) from other results in the SOS. To train and test this algorithm, we used 481,150 test results for 45,174 patients on blood tests between January 2019 and December 2021, with 5,897 abnormal results.
Results: This model was trained and tested on a 70%/30% split (train/test result cohort), achieving an area under the receiver operator curve on the test set of 0.866 (95% CI, 0.857 to 0.875). Assigning a threshold for predictions so the model achieves a sensitivity of 0.924 and a specificity of 0.530 and only performing a phosphate test for results above this threshold, the number of phosphate tests would be reduced from 142,647 to 67,873 in this test set, capturing 1,586 of the total 1,716 abnormal results with a small risk (<0.1%) of missing an abnormal result. The model was further validated on a separate validation cohort between January 2022 and December 2023, achieving similar levels of performance.
Conclusion: A MLA to optimize testing of phosphate has been developed with high sensitivity. Its application in routine care might result in cost-savings and health care efficiencies. The methodology used to develop our MLA model can be applied to other settings where interrelated variables are measured in SOS.