A consensus-based classification workflow to determine genetically inferred ancestry from comprehensive genomic profiling of patients with solid tumors.

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2024-09-23 DOI:10.1093/bib/bbae557

Zachary D Wallen, Mary K Nesline, Sarabjot Pabla, Shuang Gao, Erik Vanroey, Stephanie B Hastings, Heidi Ko, Kyle C Strickland, Rebecca A Previs, Shengle Zhang, Jeffrey M Conroy, Taylor J Jensen, Elizabeth George, Marcia Eisenberg, Brian Caveney, Pratheesh Sathyan, Shakti Ramkissoon, Eric A Severson

{"title":"A consensus-based classification workflow to determine genetically inferred ancestry from comprehensive genomic profiling of patients with solid tumors.","authors":"Zachary D Wallen, Mary K Nesline, Sarabjot Pabla, Shuang Gao, Erik Vanroey, Stephanie B Hastings, Heidi Ko, Kyle C Strickland, Rebecca A Previs, Shengle Zhang, Jeffrey M Conroy, Taylor J Jensen, Elizabeth George, Marcia Eisenberg, Brian Caveney, Pratheesh Sathyan, Shakti Ramkissoon, Eric A Severson","doi":"10.1093/bib/bbae557","DOIUrl":null,"url":null,"abstract":"<p><p>Disparities in cancer diagnosis, treatment, and outcomes based on self-identified race and ethnicity (SIRE) are well documented, yet these variables have historically been excluded from clinical research. Without SIRE, genetic ancestry can be inferred using single-nucleotide polymorphisms (SNPs) detected from tumor DNA using comprehensive genomic profiling (CGP). However, factors inherent to CGP of tumor DNA increase the difficulty of identifying ancestry-informative SNPs, and current workflows for inferring genetic ancestry from CGP need improvements in key areas of the ancestry inference process. This study used genomic data from 4274 diverse reference subjects and CGP data from 491 patients with solid tumors and SIRE to develop and validate a workflow to obtain accurate genetically inferred ancestry (GIA) from CGP sequencing results. We use consensus-based classification to derive confident ancestral inferences from an expanded reference dataset covering eight world populations (African, Admixed American, Central Asian/Siberian, European, East Asian, Middle Eastern, Oceania, South Asian). Our GIA calls were highly concordant with SIRE (95%) and aligned well with reference populations of inferred ancestries. Further, our workflow could expand on SIRE by (i) detecting the ancestry of patients that usually lack appropriate racial categories, (ii) determining what patients have mixed ancestry, and (iii) resolving ancestries of patients in heterogeneous racial categories and who had missing SIRE. Accurate GIA provides needed information to enable ancestry-aware biomarker research, ensure the inclusion of underrepresented groups in clinical research, and increase the diverse representation of patient populations eligible for precision medicine therapies and trials.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11521331/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae557","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Disparities in cancer diagnosis, treatment, and outcomes based on self-identified race and ethnicity (SIRE) are well documented, yet these variables have historically been excluded from clinical research. Without SIRE, genetic ancestry can be inferred using single-nucleotide polymorphisms (SNPs) detected from tumor DNA using comprehensive genomic profiling (CGP). However, factors inherent to CGP of tumor DNA increase the difficulty of identifying ancestry-informative SNPs, and current workflows for inferring genetic ancestry from CGP need improvements in key areas of the ancestry inference process. This study used genomic data from 4274 diverse reference subjects and CGP data from 491 patients with solid tumors and SIRE to develop and validate a workflow to obtain accurate genetically inferred ancestry (GIA) from CGP sequencing results. We use consensus-based classification to derive confident ancestral inferences from an expanded reference dataset covering eight world populations (African, Admixed American, Central Asian/Siberian, European, East Asian, Middle Eastern, Oceania, South Asian). Our GIA calls were highly concordant with SIRE (95%) and aligned well with reference populations of inferred ancestries. Further, our workflow could expand on SIRE by (i) detecting the ancestry of patients that usually lack appropriate racial categories, (ii) determining what patients have mixed ancestry, and (iii) resolving ancestries of patients in heterogeneous racial categories and who had missing SIRE. Accurate GIA provides needed information to enable ancestry-aware biomarker research, ensure the inclusion of underrepresented groups in clinical research, and increase the diverse representation of patient populations eligible for precision medicine therapies and trials.

查看原文本刊更多论文

基于共识的分类工作流程，从实体瘤患者的综合基因组图谱中确定基因推断祖先。

基于自我认同的种族和民族（SIRE）在癌症诊断、治疗和预后方面的差异已被充分记录在案，但这些变量历来被排除在临床研究之外。在没有 SIRE 的情况下，可以利用综合基因组分析（CGP）从肿瘤 DNA 中检测到的单核苷酸多态性（SNPs）来推断遗传血统。然而，肿瘤 DNA CGP 的固有因素增加了鉴定具有祖先信息的 SNP 的难度，目前从 CGP 推断遗传祖先的工作流程需要在祖先推断过程的关键领域进行改进。本研究使用了来自 4274 名不同参考对象的基因组数据和来自 491 名实体瘤和 SIRE 患者的 CGP 数据，开发并验证了从 CGP 测序结果中获得准确遗传祖先推断（GIA）的工作流程。我们采用基于共识的分类方法，从涵盖世界八大人群（非洲人、美洲混血人、中亚/西伯利亚人、欧洲人、东亚人、中东人、大洋洲人、南亚人）的扩展参考数据集中得出可靠的祖先推断。我们的 GIA 调用与 SIRE 高度一致（95%），并与推断祖先的参考人群非常吻合。此外，我们的工作流程还可以通过以下方式扩展 SIRE：(i) 检测通常缺乏适当种族类别的患者的祖先；(ii) 确定哪些患者具有混合祖先；(iii) 解决异质种族类别和 SIRE 缺失的患者的祖先问题。准确的 GIA 可提供所需的信息，以开展具有祖先意识的生物标记物研究，确保将代表性不足的群体纳入临床研究，并提高有资格接受精准医学疗法和试验的患者群体的多样性代表性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.