Thales C Nepomuceno, Paulo Lyra, Jianbin Zhu, Fanchao Yi, Rachael H Martin, Daniel Lupu, Luke Peterson, Lauren C Peres, Anna Berry, Edwin S Iversen, Fergus J Couch, Qianxing Mo, Alvaro N Monteiro
{"title":"Assessment of <i>BRCA1</i> and <i>BRCA2</i> Germline Variant Data From Patients With Breast Cancer in a Real-World Data Registry.","authors":"Thales C Nepomuceno, Paulo Lyra, Jianbin Zhu, Fanchao Yi, Rachael H Martin, Daniel Lupu, Luke Peterson, Lauren C Peres, Anna Berry, Edwin S Iversen, Fergus J Couch, Qianxing Mo, Alvaro N Monteiro","doi":"10.1200/CCI.23.00251","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The emergence of large real-world clinical databases and tools to mine electronic medical records has allowed for an unprecedented look at large data sets with clinical and epidemiologic correlates. In clinical cancer genetics, real-world databases allow for the investigation of prevalence and effectiveness of prevention strategies and targeted treatments and for the identification of barriers to better outcomes. However, real-world data sets have inherent biases and problems (eg, selection bias, incomplete data, measurement error) that may hamper adequate analysis and affect statistical power.</p><p><strong>Methods: </strong>Here, we leverage a real-world clinical data set from a large health network for patients with breast cancer tested for variants in <i>BRCA1</i> and <i>BRCA2</i> (N = 12,423). We conducted data cleaning and harmonization, cross-referenced with publicly available databases, performed variant reassessment and functional assays, and used functional data to inform a variant's clinical significance applying American College of Medical Geneticists and the Association of Molecular Pathology guidelines.</p><p><strong>Results: </strong>In the cohort, White and Black patients were over-represented, whereas non-White Hispanic and Asian patients were under-represented. Incorrect or missing variant designations were the most significant contributor to data loss. While manual curation corrected many incorrect designations, a sizable fraction of patient carriers remained with incorrect or missing variant designations. Despite the large number of patients with clinical significance not reported, original reported clinical significance assessments were accurate. Reassessment of variants in which clinical significance was not reported led to a marked improvement in data quality.</p><p><strong>Conclusion: </strong>We identify the most common issues with <i>BRCA1</i> and <i>BRCA2</i> testing data entry and suggest approaches to minimize data loss and keep interpretation of clinical significance of variants up to date.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11161245/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The emergence of large real-world clinical databases and tools to mine electronic medical records has allowed for an unprecedented look at large data sets with clinical and epidemiologic correlates. In clinical cancer genetics, real-world databases allow for the investigation of prevalence and effectiveness of prevention strategies and targeted treatments and for the identification of barriers to better outcomes. However, real-world data sets have inherent biases and problems (eg, selection bias, incomplete data, measurement error) that may hamper adequate analysis and affect statistical power.
Methods: Here, we leverage a real-world clinical data set from a large health network for patients with breast cancer tested for variants in BRCA1 and BRCA2 (N = 12,423). We conducted data cleaning and harmonization, cross-referenced with publicly available databases, performed variant reassessment and functional assays, and used functional data to inform a variant's clinical significance applying American College of Medical Geneticists and the Association of Molecular Pathology guidelines.
Results: In the cohort, White and Black patients were over-represented, whereas non-White Hispanic and Asian patients were under-represented. Incorrect or missing variant designations were the most significant contributor to data loss. While manual curation corrected many incorrect designations, a sizable fraction of patient carriers remained with incorrect or missing variant designations. Despite the large number of patients with clinical significance not reported, original reported clinical significance assessments were accurate. Reassessment of variants in which clinical significance was not reported led to a marked improvement in data quality.
Conclusion: We identify the most common issues with BRCA1 and BRCA2 testing data entry and suggest approaches to minimize data loss and keep interpretation of clinical significance of variants up to date.