确定突变特征差分丰度的dirichlet -多项式混合模型。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-02-18 DOI:10.1186/s12859-025-06055-x

Lena Morrill Gavarró, Dominique-Laurent Couturier, Florian Markowetz

{"title":"确定突变特征差分丰度的dirichlet -多项式混合模型。","authors":"Lena Morrill Gavarró, Dominique-Laurent Couturier, Florian Markowetz","doi":"10.1186/s12859-025-06055-x","DOIUrl":null,"url":null,"abstract":"Background: Mutational processes of diverse origin leave their imprints in the genome during tumour evolution. These imprints are called mutational signatures and they have been characterised for point mutations, structural variants and copy number changes. Each signature has an exposure, or abundance, per sample, which indicates how much a process has contributed to the overall genomic change. Mutational processes are not static, and a better understanding of their dynamics is key to characterise tumour evolution and identify cancer cell vulnerabilities that can be exploited during treatment. However, the structure of the data typically collected in this context makes it difficult to test whether signature exposures differ between conditions or time-points when comparing groups of samples. In general, the data consists of multivariate count mutational data (e.g. signature exposures) with two observations per patient, each reflecting a group.Results: We propose a mixed-effects Dirichlet-multinomial model: within-patient correlations are taken into account with random effects, possible correlations between signatures by making such random effects multivariate, and a group-specific dispersion parameter can deal with particularities of the groups. Moreover, the model is flexible in its fixed-effects structure, so that the two-group comparison can be generalised to several groups, or to a regression setting. We apply our approach to characterise differences of mutational processes between clonal and subclonal mutations across 23 cancer types of the PCAWG cohort. We find ubiquitous differential abundance of clonal and subclonal signatures across cancer types, and higher dispersion of signatures in the subclonal group, indicating higher variability between patients at subclonal level, possibly due to the presence of different clones with distinct active mutational processes.Conclusions: Mutational signature analysis is an expanding field and we envision our framework to be used widely to detect global changes in mutational process activity. Our methodology is available in the R package CompSign and offers an ample toolkit for the analysis and visualisation of differential abundance of compositional data such as, but not restricted to, mutational signatures.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"59"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837616/pdf/","citationCount":"0","resultStr":"{\"title\":\"A Dirichlet-multinomial mixed model for determining differential abundance of mutational signatures.\",\"authors\":\"Lena Morrill Gavarró, Dominique-Laurent Couturier, Florian Markowetz\",\"doi\":\"10.1186/s12859-025-06055-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Mutational processes of diverse origin leave their imprints in the genome during tumour evolution. These imprints are called mutational signatures and they have been characterised for point mutations, structural variants and copy number changes. Each signature has an exposure, or abundance, per sample, which indicates how much a process has contributed to the overall genomic change. Mutational processes are not static, and a better understanding of their dynamics is key to characterise tumour evolution and identify cancer cell vulnerabilities that can be exploited during treatment. However, the structure of the data typically collected in this context makes it difficult to test whether signature exposures differ between conditions or time-points when comparing groups of samples. In general, the data consists of multivariate count mutational data (e.g. signature exposures) with two observations per patient, each reflecting a group.Results: We propose a mixed-effects Dirichlet-multinomial model: within-patient correlations are taken into account with random effects, possible correlations between signatures by making such random effects multivariate, and a group-specific dispersion parameter can deal with particularities of the groups. Moreover, the model is flexible in its fixed-effects structure, so that the two-group comparison can be generalised to several groups, or to a regression setting. We apply our approach to characterise differences of mutational processes between clonal and subclonal mutations across 23 cancer types of the PCAWG cohort. We find ubiquitous differential abundance of clonal and subclonal signatures across cancer types, and higher dispersion of signatures in the subclonal group, indicating higher variability between patients at subclonal level, possibly due to the presence of different clones with distinct active mutational processes.Conclusions: Mutational signature analysis is an expanding field and we envision our framework to be used widely to detect global changes in mutational process activity. Our methodology is available in the R package CompSign and offers an ample toolkit for the analysis and visualisation of differential abundance of compositional data such as, but not restricted to, mutational signatures.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"59\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837616/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06055-x\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06055-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：不同起源的突变过程在肿瘤进化过程中在基因组中留下印记。这些印记被称为突变特征，它们的特征是点突变、结构变异和拷贝数变化。每个样本的每个特征都有一个暴露度或丰度，这表明一个过程对整个基因组变化的贡献有多大。突变过程不是静态的，更好地了解它们的动态是表征肿瘤进化和识别可在治疗期间利用的癌细胞脆弱性的关键。然而，在这种情况下通常收集的数据结构使得在比较样品组时很难测试特征暴露是否在不同条件或时间点之间有所不同。一般来说，数据由多变量计数突变数据（例如特征暴露）组成，每个患者有两个观察结果，每个观察结果反映一个组。结果：我们提出了一个混合效应dirichlet -多项式模型：患者内部的相关性考虑到随机效应，通过使随机效应多变量来考虑特征之间可能的相关性，并且特定于群体的分散参数可以处理群体的特殊性。此外，该模型具有灵活的固定效应结构，因此两组比较可以推广到几个组，或者一个回归设置。我们应用我们的方法来表征在23种癌症类型的PCAWG队列中克隆和亚克隆突变之间的突变过程差异。我们发现不同癌症类型的克隆和亚克隆特征的普遍差异丰度，以及亚克隆组中特征的更高分散性，表明亚克隆水平患者之间存在更高的可变性，可能是由于存在具有不同活跃突变过程的不同克隆。结论：突变特征分析是一个不断扩大的领域，我们设想我们的框架被广泛用于检测突变过程活动的全球变化。我们的方法在R包CompSign中可用，它提供了一个丰富的工具包，用于分析和可视化不同丰度的组成数据，例如，但不限于，突变签名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Dirichlet-multinomial mixed model for determining differential abundance of mutational signatures.

Background: Mutational processes of diverse origin leave their imprints in the genome during tumour evolution. These imprints are called mutational signatures and they have been characterised for point mutations, structural variants and copy number changes. Each signature has an exposure, or abundance, per sample, which indicates how much a process has contributed to the overall genomic change. Mutational processes are not static, and a better understanding of their dynamics is key to characterise tumour evolution and identify cancer cell vulnerabilities that can be exploited during treatment. However, the structure of the data typically collected in this context makes it difficult to test whether signature exposures differ between conditions or time-points when comparing groups of samples. In general, the data consists of multivariate count mutational data (e.g. signature exposures) with two observations per patient, each reflecting a group.

Results: We propose a mixed-effects Dirichlet-multinomial model: within-patient correlations are taken into account with random effects, possible correlations between signatures by making such random effects multivariate, and a group-specific dispersion parameter can deal with particularities of the groups. Moreover, the model is flexible in its fixed-effects structure, so that the two-group comparison can be generalised to several groups, or to a regression setting. We apply our approach to characterise differences of mutational processes between clonal and subclonal mutations across 23 cancer types of the PCAWG cohort. We find ubiquitous differential abundance of clonal and subclonal signatures across cancer types, and higher dispersion of signatures in the subclonal group, indicating higher variability between patients at subclonal level, possibly due to the presence of different clones with distinct active mutational processes.

Conclusions: Mutational signature analysis is an expanding field and we envision our framework to be used widely to detect global changes in mutational process activity. Our methodology is available in the R package CompSign and offers an ample toolkit for the analysis and visualisation of differential abundance of compositional data such as, but not restricted to, mutational signatures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.