Creating a Proxy for Baseline Eastern Cooperative Oncology Group Performance Status in Electronic Health Records for Comparative Effectiveness Research in Advanced Non-Small Cell Lung Cancer.
Michael Johnson, Peining Tao, Mehmet Burcu, John Kang, Richard Baumgartner, Junshui Ma, Vladimir Svetnik
{"title":"Creating a Proxy for Baseline Eastern Cooperative Oncology Group Performance Status in Electronic Health Records for Comparative Effectiveness Research in Advanced Non-Small Cell Lung Cancer.","authors":"Michael Johnson, Peining Tao, Mehmet Burcu, John Kang, Richard Baumgartner, Junshui Ma, Vladimir Svetnik","doi":"10.1200/CCI-24-00185","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Eastern Cooperative Oncology Group performance status (ECOG PS) is a key confounder in comparative effectiveness research, predicting treatment and survival, but is often incomplete in electronic health records (EHRs). Imputation on the basis of classification metrics alone may introduce differences in survival between patients with known and imputed ECOG PS, complicating comparative effectiveness research. We developed an approach to impute ECOG PS so that those with known and imputed ECOG PS are indistinguishable in their survival, reducing potential biases introduced by the imputation.</p><p><strong>Methods: </strong>We analyzed deidentified data from an EHR-derived database for patients with advanced non-small cell lung cancer (aNSCLC) at their first line of treatment. Our novel imputation method involved (1) sample-splitting patients with known ECOG PS into modeling and thresholding data sets, (2) developing a predictive model of ECOG PS, (3) determining an optimal threshold aligning clinical outcomes, where a choice of outcome metric may depend on the use case, and (4) applying the model and threshold to impute missing ECOG PS. We evaluated the approach using binary classification metrics and alignment of survival metrics between observed and imputed ECOG PS.</p><p><strong>Results: </strong>Of 62,101 patients, 13,297 (21%) had missing ECOG PS at the start of their first treatment. Our method achieved similar or better performance in accuracy (73.3%), sensitivity (42.4%), and specificity (81%) compared with other techniques, with smaller survival metric differences between observed and imputed ECOG PS, with differences of 0.07 in hazard ratio, -0.36 months in median survival for good ECOG PS (<2), and -0.39 months for poor ECOG PS (≥2).</p><p><strong>Conclusion: </strong>Our imputed ECOG PS aligning clinical outcomes enhanced the use of real-world EHR data of patients with aNSCLC for comparative effectiveness research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400185"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-24-00185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/3 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Eastern Cooperative Oncology Group performance status (ECOG PS) is a key confounder in comparative effectiveness research, predicting treatment and survival, but is often incomplete in electronic health records (EHRs). Imputation on the basis of classification metrics alone may introduce differences in survival between patients with known and imputed ECOG PS, complicating comparative effectiveness research. We developed an approach to impute ECOG PS so that those with known and imputed ECOG PS are indistinguishable in their survival, reducing potential biases introduced by the imputation.
Methods: We analyzed deidentified data from an EHR-derived database for patients with advanced non-small cell lung cancer (aNSCLC) at their first line of treatment. Our novel imputation method involved (1) sample-splitting patients with known ECOG PS into modeling and thresholding data sets, (2) developing a predictive model of ECOG PS, (3) determining an optimal threshold aligning clinical outcomes, where a choice of outcome metric may depend on the use case, and (4) applying the model and threshold to impute missing ECOG PS. We evaluated the approach using binary classification metrics and alignment of survival metrics between observed and imputed ECOG PS.
Results: Of 62,101 patients, 13,297 (21%) had missing ECOG PS at the start of their first treatment. Our method achieved similar or better performance in accuracy (73.3%), sensitivity (42.4%), and specificity (81%) compared with other techniques, with smaller survival metric differences between observed and imputed ECOG PS, with differences of 0.07 in hazard ratio, -0.36 months in median survival for good ECOG PS (<2), and -0.39 months for poor ECOG PS (≥2).
Conclusion: Our imputed ECOG PS aligning clinical outcomes enhanced the use of real-world EHR data of patients with aNSCLC for comparative effectiveness research.