Matthew Spotnitz, John Giannini, Emily Clark, Yechiam Ostchega, Tamara R Litwin, Stephanie L Goff, Lew Berman
{"title":"评估我们所有人研究项目中外科肿瘤队列的数据质量维度。","authors":"Matthew Spotnitz, John Giannini, Emily Clark, Yechiam Ostchega, Tamara R Litwin, Stephanie L Goff, Lew Berman","doi":"10.1200/CCI-25-00078","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Cancer is a leading cause of morbidity and mortality in the United States. Mapping electronic health record (EHR) data to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) may standardize data structure and allow for multiple database oncology studies. However, the number of oncology studies produced with the OMOP CDM has been low. To investigate the discrepancy between the public health impact of cancer and the output of OMOP CDM clinical cancer studies, we evaluated (EHR) data quality of five surgical oncology cohorts in the <i>All of Us</i> Research Program: mastectomy, prostatectomy, colectomy, melanoma excision, and lung cancer resection.</p><p><strong>Methods: </strong>We selected procedure codes that were the basis of each phenotype. We used a data quality checklist to evaluate five domains systematically: conformance, completeness, concordance, plausibility, and temporality.</p><p><strong>Results: </strong>Most phenotype-defining source codes were mapped to Current Procedural Terminology 4, which is an EHR standard. All cohorts had low concept prevalence. Most bivariate correlations between concepts were weak (⍴ ≤ 0.5). The small number of biomarkers available for use limited our plausibility analysis. The median time between biopsy and surgery varied across cohorts.</p><p><strong>Conclusion: </strong>We identified multiple data completeness issues, which limited the fitness for use evaluation. Also, using the OMOP CDM procedure concepts and mappings presented challenges for our study. Variable amounts of missingness in OMOP CDM surgical oncology data may affect the fitness for use of cancer data. Further research is warranted to improve the quality of that data.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500078"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240465/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing the Data Quality Dimensions of Surgical Oncology Cohorts in the <i>All of Us</i> Research Program.\",\"authors\":\"Matthew Spotnitz, John Giannini, Emily Clark, Yechiam Ostchega, Tamara R Litwin, Stephanie L Goff, Lew Berman\",\"doi\":\"10.1200/CCI-25-00078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Cancer is a leading cause of morbidity and mortality in the United States. Mapping electronic health record (EHR) data to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) may standardize data structure and allow for multiple database oncology studies. However, the number of oncology studies produced with the OMOP CDM has been low. To investigate the discrepancy between the public health impact of cancer and the output of OMOP CDM clinical cancer studies, we evaluated (EHR) data quality of five surgical oncology cohorts in the <i>All of Us</i> Research Program: mastectomy, prostatectomy, colectomy, melanoma excision, and lung cancer resection.</p><p><strong>Methods: </strong>We selected procedure codes that were the basis of each phenotype. We used a data quality checklist to evaluate five domains systematically: conformance, completeness, concordance, plausibility, and temporality.</p><p><strong>Results: </strong>Most phenotype-defining source codes were mapped to Current Procedural Terminology 4, which is an EHR standard. All cohorts had low concept prevalence. Most bivariate correlations between concepts were weak (⍴ ≤ 0.5). The small number of biomarkers available for use limited our plausibility analysis. The median time between biopsy and surgery varied across cohorts.</p><p><strong>Conclusion: </strong>We identified multiple data completeness issues, which limited the fitness for use evaluation. Also, using the OMOP CDM procedure concepts and mappings presented challenges for our study. Variable amounts of missingness in OMOP CDM surgical oncology data may affect the fitness for use of cancer data. Further research is warranted to improve the quality of that data.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"9 \",\"pages\":\"e2500078\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240465/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI-25-00078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/8 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-25-00078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/8 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Assessing the Data Quality Dimensions of Surgical Oncology Cohorts in the All of Us Research Program.
Purpose: Cancer is a leading cause of morbidity and mortality in the United States. Mapping electronic health record (EHR) data to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) may standardize data structure and allow for multiple database oncology studies. However, the number of oncology studies produced with the OMOP CDM has been low. To investigate the discrepancy between the public health impact of cancer and the output of OMOP CDM clinical cancer studies, we evaluated (EHR) data quality of five surgical oncology cohorts in the All of Us Research Program: mastectomy, prostatectomy, colectomy, melanoma excision, and lung cancer resection.
Methods: We selected procedure codes that were the basis of each phenotype. We used a data quality checklist to evaluate five domains systematically: conformance, completeness, concordance, plausibility, and temporality.
Results: Most phenotype-defining source codes were mapped to Current Procedural Terminology 4, which is an EHR standard. All cohorts had low concept prevalence. Most bivariate correlations between concepts were weak (⍴ ≤ 0.5). The small number of biomarkers available for use limited our plausibility analysis. The median time between biopsy and surgery varied across cohorts.
Conclusion: We identified multiple data completeness issues, which limited the fitness for use evaluation. Also, using the OMOP CDM procedure concepts and mappings presented challenges for our study. Variable amounts of missingness in OMOP CDM surgical oncology data may affect the fitness for use of cancer data. Further research is warranted to improve the quality of that data.