Patrick Lewicki, Yasmin Benhalim, Joshua Bradin, Kim Dryden, Husain Hakim, Benjamin Heasman, Ana Taylor, Jawad Aqeel, Anuush Vejalla, Marisa Conte, Rachel Richesson, Kristian Stensland
{"title":"开发和评估电子健康记录衍生的可计算表型,以确定接受前列腺癌筛查的患者。","authors":"Patrick Lewicki, Yasmin Benhalim, Joshua Bradin, Kim Dryden, Husain Hakim, Benjamin Heasman, Ana Taylor, Jawad Aqeel, Anuush Vejalla, Marisa Conte, Rachel Richesson, Kristian Stensland","doi":"10.1200/CCI-24-00261","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Given challenges with randomized trials, tumor registries, and insurance claims, electronic health record data are an appealing resource for studying prostate-specific antigen (PSA) screening for prostate cancer. Transparent, well-evaluated computable phenotypes that observe a stringent definition of screening (<i>v</i> for-cause diagnosis- or symptom-directed testing) are critical for reproducibility and comparison with prospective cohorts.</p><p><strong>Methods: </strong>A cohort of patients who underwent PSA testing in a primary care setting at a large, tertiary health care system was identified. Gold-standard labels for screening versus not screening were created via a combination of clinical note text review and exclusionary diagnosis codes. Ten computable phenotype definitions were created by urology content experts and then evaluated for sensitivity, specificity, and positive predictive value (PPV) and negative predictive value against gold-standard labels.</p><p><strong>Results: </strong>Three hundred fifty-five patients with gold-standard labels were included in the final study cohort. Varying by how missing text data were classified (not applicable <i>v</i> screening), 149 (50.3%) and 208 (58.6%) patients underwent screening. No single phenotype optimized both sensitivity and PPV, although a composite definition that included either (1) absence of symptoms or (2) presence of an encounter for screening code achieved a very high PPV of 0.99 (95% CI, 0.96 to 1.00) with a reasonable sensitivity of 0.82 (95% CI, 0.75 to 0.88).</p><p><strong>Conclusion: </strong>We identify code-based PSA screening phenotypes with a range of performance characteristics. Prevalence of for-cause diagnosis- and symptom-directed testing are significant and may contaminate cohorts not taking related codes into account.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400261"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and Evaluation of an Electronic Health Record-Derived Computable Phenotype to Identify Patients Undergoing Prostate Cancer Screening.\",\"authors\":\"Patrick Lewicki, Yasmin Benhalim, Joshua Bradin, Kim Dryden, Husain Hakim, Benjamin Heasman, Ana Taylor, Jawad Aqeel, Anuush Vejalla, Marisa Conte, Rachel Richesson, Kristian Stensland\",\"doi\":\"10.1200/CCI-24-00261\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Given challenges with randomized trials, tumor registries, and insurance claims, electronic health record data are an appealing resource for studying prostate-specific antigen (PSA) screening for prostate cancer. Transparent, well-evaluated computable phenotypes that observe a stringent definition of screening (<i>v</i> for-cause diagnosis- or symptom-directed testing) are critical for reproducibility and comparison with prospective cohorts.</p><p><strong>Methods: </strong>A cohort of patients who underwent PSA testing in a primary care setting at a large, tertiary health care system was identified. Gold-standard labels for screening versus not screening were created via a combination of clinical note text review and exclusionary diagnosis codes. Ten computable phenotype definitions were created by urology content experts and then evaluated for sensitivity, specificity, and positive predictive value (PPV) and negative predictive value against gold-standard labels.</p><p><strong>Results: </strong>Three hundred fifty-five patients with gold-standard labels were included in the final study cohort. Varying by how missing text data were classified (not applicable <i>v</i> screening), 149 (50.3%) and 208 (58.6%) patients underwent screening. No single phenotype optimized both sensitivity and PPV, although a composite definition that included either (1) absence of symptoms or (2) presence of an encounter for screening code achieved a very high PPV of 0.99 (95% CI, 0.96 to 1.00) with a reasonable sensitivity of 0.82 (95% CI, 0.75 to 0.88).</p><p><strong>Conclusion: </strong>We identify code-based PSA screening phenotypes with a range of performance characteristics. Prevalence of for-cause diagnosis- and symptom-directed testing are significant and may contaminate cohorts not taking related codes into account.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"9 \",\"pages\":\"e2400261\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI-24-00261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/25 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-24-00261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Development and Evaluation of an Electronic Health Record-Derived Computable Phenotype to Identify Patients Undergoing Prostate Cancer Screening.
Purpose: Given challenges with randomized trials, tumor registries, and insurance claims, electronic health record data are an appealing resource for studying prostate-specific antigen (PSA) screening for prostate cancer. Transparent, well-evaluated computable phenotypes that observe a stringent definition of screening (v for-cause diagnosis- or symptom-directed testing) are critical for reproducibility and comparison with prospective cohorts.
Methods: A cohort of patients who underwent PSA testing in a primary care setting at a large, tertiary health care system was identified. Gold-standard labels for screening versus not screening were created via a combination of clinical note text review and exclusionary diagnosis codes. Ten computable phenotype definitions were created by urology content experts and then evaluated for sensitivity, specificity, and positive predictive value (PPV) and negative predictive value against gold-standard labels.
Results: Three hundred fifty-five patients with gold-standard labels were included in the final study cohort. Varying by how missing text data were classified (not applicable v screening), 149 (50.3%) and 208 (58.6%) patients underwent screening. No single phenotype optimized both sensitivity and PPV, although a composite definition that included either (1) absence of symptoms or (2) presence of an encounter for screening code achieved a very high PPV of 0.99 (95% CI, 0.96 to 1.00) with a reasonable sensitivity of 0.82 (95% CI, 0.75 to 0.88).
Conclusion: We identify code-based PSA screening phenotypes with a range of performance characteristics. Prevalence of for-cause diagnosis- and symptom-directed testing are significant and may contaminate cohorts not taking related codes into account.