An analytic pipeline to obtain reliable genetic ancestry estimates from tumor-derived RNA sequencing data.

Courtney E Johnson, Ximing Ran, Julia Wrobel, Natalie R Davidson, Casey S Greene, Michael P Epstein, Jeffrey R Marks, Lauren C Peres, Jennifer A Doherty, Joellen M Schildkraut
{"title":"An analytic pipeline to obtain reliable genetic ancestry estimates from tumor-derived RNA sequencing data.","authors":"Courtney E Johnson, Ximing Ran, Julia Wrobel, Natalie R Davidson, Casey S Greene, Michael P Epstein, Jeffrey R Marks, Lauren C Peres, Jennifer A Doherty, Joellen M Schildkraut","doi":"10.1158/1055-9965.EPI-25-0371","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Germline genetics may influence tumor molecular characteristics and ultimately cancer survival. Studies of tumor characteristics, including our epithelial ovarian cancer (EOC) studies of Black women in the United States, may have RNASeq data from archival tumor tissue but lack germline DNA for at least some individuals. Incomplete germline DNA measurements impede analyses of important measures like global genetic ancestry, often used in downstream analyses, by reducing sample sizes.</p><p><strong>Methods: </strong>The study population consists of 184 women who participated in two population-based studies of EOC with both germline and formalin-fixed paraffin-embedded (FFPE) tumor samples and an additional 58 women diagnosed with EOC from the same two studies with only FFPE tumor tissue. We used tumor RNASeq data to calculate proportions of African, European, and Asian genetic ancestry using a pipeline built on the packages SeqKit, HISAT2, SAMtools, BCFtools, plink, and ADMIXTURE. Women from the 1000 Genomes Project were used as the reference populations, and germline genetic ancestry estimates from blood or saliva were used as the baseline comparison. We evaluated multiple quality control strategies to improve genetic ancestry estimation.</p><p><strong>Results: </strong>Correlations between tumor RNASeq-derived estimates of genetic ancestry from our pipeline and germline-derived African and European genetic ancestry ranged between 0.76-0.94.</p><p><strong>Conclusions: </strong>RNASeq data from archival FFPE tumor tissue can be confidently and efficiently used to approximate global genetic ancestry in an admixed population when germline DNA is unavailable.</p><p><strong>Impact: </strong>This approach supports analyses of genetic ancestry and cancer when germline samples are not available.</p>","PeriodicalId":520580,"journal":{"name":"Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1158/1055-9965.EPI-25-0371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Germline genetics may influence tumor molecular characteristics and ultimately cancer survival. Studies of tumor characteristics, including our epithelial ovarian cancer (EOC) studies of Black women in the United States, may have RNASeq data from archival tumor tissue but lack germline DNA for at least some individuals. Incomplete germline DNA measurements impede analyses of important measures like global genetic ancestry, often used in downstream analyses, by reducing sample sizes.

Methods: The study population consists of 184 women who participated in two population-based studies of EOC with both germline and formalin-fixed paraffin-embedded (FFPE) tumor samples and an additional 58 women diagnosed with EOC from the same two studies with only FFPE tumor tissue. We used tumor RNASeq data to calculate proportions of African, European, and Asian genetic ancestry using a pipeline built on the packages SeqKit, HISAT2, SAMtools, BCFtools, plink, and ADMIXTURE. Women from the 1000 Genomes Project were used as the reference populations, and germline genetic ancestry estimates from blood or saliva were used as the baseline comparison. We evaluated multiple quality control strategies to improve genetic ancestry estimation.

Results: Correlations between tumor RNASeq-derived estimates of genetic ancestry from our pipeline and germline-derived African and European genetic ancestry ranged between 0.76-0.94.

Conclusions: RNASeq data from archival FFPE tumor tissue can be confidently and efficiently used to approximate global genetic ancestry in an admixed population when germline DNA is unavailable.

Impact: This approach supports analyses of genetic ancestry and cancer when germline samples are not available.

从肿瘤来源的RNA测序数据中获得可靠的遗传祖先估计的分析管道。
背景:生殖系遗传学可能影响肿瘤分子特征并最终影响肿瘤生存。肿瘤特征的研究,包括我们对美国黑人女性上皮性卵巢癌(EOC)的研究,可能有来自档案肿瘤组织的RNASeq数据,但至少有一些个体缺乏种系DNA。不完整的生殖系DNA测量减少了样本量,阻碍了下游分析中常用的全球遗传血统等重要指标的分析。方法:研究人群包括184名女性,她们参加了两项基于人群的研究,包括生殖系和福尔马林固定石蜡包埋(FFPE)肿瘤样本的EOC,以及另外58名来自同样两项研究的仅FFPE肿瘤组织诊断为EOC的女性。使用基于SeqKit、HISAT2、SAMtools、BCFtools、plink和ADMIXTURE软件包的管道,我们使用肿瘤RNASeq数据计算非洲、欧洲和亚洲遗传祖先的比例。来自1000个基因组计划的女性被用作参考人群,来自血液或唾液的种系遗传祖先估计值被用作基线比较。我们评估了多种质量控制策略,以提高遗传祖先的估计。结果:肿瘤rnaseq衍生的遗传祖先估计值与种系衍生的非洲和欧洲遗传祖先之间的相关性在0.76-0.94之间。结论:当种系DNA不可用时,来自档案FFPE肿瘤组织的RNASeq数据可以自信有效地用于估算混合人群的整体遗传祖先。影响:当没有生殖系样本时,这种方法支持遗传祖先和癌症的分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信