Courtney E Johnson, Ximing Ran, Julia Wrobel, Natalie R Davidson, Casey S Greene, Michael P Epstein, Jeffrey R Marks, Lauren C Peres, Jennifer A Doherty, Joellen M Schildkraut
{"title":"从肿瘤来源的RNA测序数据中获得可靠的遗传祖先估计的分析管道。","authors":"Courtney E Johnson, Ximing Ran, Julia Wrobel, Natalie R Davidson, Casey S Greene, Michael P Epstein, Jeffrey R Marks, Lauren C Peres, Jennifer A Doherty, Joellen M Schildkraut","doi":"10.1158/1055-9965.EPI-25-0371","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Germline genetics may influence tumor molecular characteristics and ultimately cancer survival. Studies of tumor characteristics, including our epithelial ovarian cancer (EOC) studies of Black women in the United States, may have RNASeq data from archival tumor tissue but lack germline DNA for at least some individuals. Incomplete germline DNA measurements impede analyses of important measures like global genetic ancestry, often used in downstream analyses, by reducing sample sizes.</p><p><strong>Methods: </strong>The study population consists of 184 women who participated in two population-based studies of EOC with both germline and formalin-fixed paraffin-embedded (FFPE) tumor samples and an additional 58 women diagnosed with EOC from the same two studies with only FFPE tumor tissue. We used tumor RNASeq data to calculate proportions of African, European, and Asian genetic ancestry using a pipeline built on the packages SeqKit, HISAT2, SAMtools, BCFtools, plink, and ADMIXTURE. Women from the 1000 Genomes Project were used as the reference populations, and germline genetic ancestry estimates from blood or saliva were used as the baseline comparison. We evaluated multiple quality control strategies to improve genetic ancestry estimation.</p><p><strong>Results: </strong>Correlations between tumor RNASeq-derived estimates of genetic ancestry from our pipeline and germline-derived African and European genetic ancestry ranged between 0.76-0.94.</p><p><strong>Conclusions: </strong>RNASeq data from archival FFPE tumor tissue can be confidently and efficiently used to approximate global genetic ancestry in an admixed population when germline DNA is unavailable.</p><p><strong>Impact: </strong>This approach supports analyses of genetic ancestry and cancer when germline samples are not available.</p>","PeriodicalId":520580,"journal":{"name":"Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An analytic pipeline to obtain reliable genetic ancestry estimates from tumor-derived RNA sequencing data.\",\"authors\":\"Courtney E Johnson, Ximing Ran, Julia Wrobel, Natalie R Davidson, Casey S Greene, Michael P Epstein, Jeffrey R Marks, Lauren C Peres, Jennifer A Doherty, Joellen M Schildkraut\",\"doi\":\"10.1158/1055-9965.EPI-25-0371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Germline genetics may influence tumor molecular characteristics and ultimately cancer survival. Studies of tumor characteristics, including our epithelial ovarian cancer (EOC) studies of Black women in the United States, may have RNASeq data from archival tumor tissue but lack germline DNA for at least some individuals. Incomplete germline DNA measurements impede analyses of important measures like global genetic ancestry, often used in downstream analyses, by reducing sample sizes.</p><p><strong>Methods: </strong>The study population consists of 184 women who participated in two population-based studies of EOC with both germline and formalin-fixed paraffin-embedded (FFPE) tumor samples and an additional 58 women diagnosed with EOC from the same two studies with only FFPE tumor tissue. We used tumor RNASeq data to calculate proportions of African, European, and Asian genetic ancestry using a pipeline built on the packages SeqKit, HISAT2, SAMtools, BCFtools, plink, and ADMIXTURE. Women from the 1000 Genomes Project were used as the reference populations, and germline genetic ancestry estimates from blood or saliva were used as the baseline comparison. We evaluated multiple quality control strategies to improve genetic ancestry estimation.</p><p><strong>Results: </strong>Correlations between tumor RNASeq-derived estimates of genetic ancestry from our pipeline and germline-derived African and European genetic ancestry ranged between 0.76-0.94.</p><p><strong>Conclusions: </strong>RNASeq data from archival FFPE tumor tissue can be confidently and efficiently used to approximate global genetic ancestry in an admixed population when germline DNA is unavailable.</p><p><strong>Impact: </strong>This approach supports analyses of genetic ancestry and cancer when germline samples are not available.</p>\",\"PeriodicalId\":520580,\"journal\":{\"name\":\"Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1158/1055-9965.EPI-25-0371\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1158/1055-9965.EPI-25-0371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An analytic pipeline to obtain reliable genetic ancestry estimates from tumor-derived RNA sequencing data.
Background: Germline genetics may influence tumor molecular characteristics and ultimately cancer survival. Studies of tumor characteristics, including our epithelial ovarian cancer (EOC) studies of Black women in the United States, may have RNASeq data from archival tumor tissue but lack germline DNA for at least some individuals. Incomplete germline DNA measurements impede analyses of important measures like global genetic ancestry, often used in downstream analyses, by reducing sample sizes.
Methods: The study population consists of 184 women who participated in two population-based studies of EOC with both germline and formalin-fixed paraffin-embedded (FFPE) tumor samples and an additional 58 women diagnosed with EOC from the same two studies with only FFPE tumor tissue. We used tumor RNASeq data to calculate proportions of African, European, and Asian genetic ancestry using a pipeline built on the packages SeqKit, HISAT2, SAMtools, BCFtools, plink, and ADMIXTURE. Women from the 1000 Genomes Project were used as the reference populations, and germline genetic ancestry estimates from blood or saliva were used as the baseline comparison. We evaluated multiple quality control strategies to improve genetic ancestry estimation.
Results: Correlations between tumor RNASeq-derived estimates of genetic ancestry from our pipeline and germline-derived African and European genetic ancestry ranged between 0.76-0.94.
Conclusions: RNASeq data from archival FFPE tumor tissue can be confidently and efficiently used to approximate global genetic ancestry in an admixed population when germline DNA is unavailable.
Impact: This approach supports analyses of genetic ancestry and cancer when germline samples are not available.