Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg
{"title":"识别由种系和环境风险因素确定的癌症的体细胞指纹","authors":"Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg","doi":"10.1002/gepi.22565","DOIUrl":null,"url":null,"abstract":"<p>Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline <i>BRCA1/2</i> mutations and in head and neck cancer patients exposed to human papillomavirus.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 8","pages":"455-467"},"PeriodicalIF":1.7000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying somatic fingerprints of cancers defined by germline and environmental risk factors\",\"authors\":\"Saptarshi Chakraborty, Zoe Guan, Caroline E. Kostrzewa, Ronglai Shen, Colin B. Begg\",\"doi\":\"10.1002/gepi.22565\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline <i>BRCA1/2</i> mutations and in head and neck cancer patients exposed to human papillomavirus.</p>\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"48 8\",\"pages\":\"455-467\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22565\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22565","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
摘要
在过去一代人的时间里,大量研究发现了会增加特定癌症风险的种系变异。与此同时,测序技术的革命使得高通量注释体细胞基因组成为可能,从而描述了单个肿瘤的特征。然而,由于典型肿瘤中存在大量变异、大多数个体变异的罕见性以及肿瘤体细胞指纹的异质性,研究种系变异与体细胞改变模式之间的关系面临巨大挑战。在本文中,我们提出了统计方法,以可解释的方式构建种系与体细胞关系的研究框架。该方法使用元特征(meta-features)来体现个体体细胞改变的生物学背景,从而对罕见突变进行隐式分组。我们的团队以前曾通过多层次回归模型使用这一技术,高精度地诊断出肿瘤的起源部位。在这里,我们进一步利用计算语言学中的主题模型来实现元特征的可解释低维嵌入。我们展示了该方法如何识别与特定种系变异或环境风险因素相关的独特体细胞特征。我们利用癌症基因组图谱(The Cancer Genome Atlas)的全外显子组测序数据来说明该方法,以描述具有种系 BRCA1/2 基因突变的乳腺癌患者和暴露于人类乳头瘤病毒的头颈部癌症患者的体细胞肿瘤指纹特征。
Identifying somatic fingerprints of cancers defined by germline and environmental risk factors
Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.
期刊介绍:
Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations.
Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.