Differential transcript usage analysis incorporating quantification uncertainty via compositional measurement error regression modeling.

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics Pub Date : 2024-04-15 DOI:10.1093/biostatistics/kxad008

Amber M Young, Scott Van Buren, Naim U Rashid

{"title":"Differential transcript usage analysis incorporating quantification uncertainty via compositional measurement error regression modeling.","authors":"Amber M Young, Scott Van Buren, Naim U Rashid","doi":"10.1093/biostatistics/kxad008","DOIUrl":null,"url":null,"abstract":"<p><p>Differential transcript usage (DTU) occurs when the relative expression of multiple transcripts arising from the same gene changes between different conditions. Existing approaches to detect DTU often rely on computational procedures that can have speed and scalability issues as the number of samples increases. Here we propose a new method, CompDTU, that uses compositional regression to model the relative abundance proportions of each transcript that are of interest in DTU analyses. This procedure leverages fast matrix-based computations that make it ideally suited for DTU analysis with larger sample sizes. This method also allows for the testing of and adjustment for multiple categorical or continuous covariates. Additionally, many existing approaches for DTU ignore quantification uncertainty in the expression estimates for each transcript in RNA-seq data. We extend our CompDTU method to incorporate quantification uncertainty leveraging common output from RNA-seq expression quantification tool in a novel method CompDTUme. Through several power analyses, we show that CompDTU has excellent sensitivity and reduces false positive results relative to existing methods. Additionally, CompDTUme results in further improvements in performance over CompDTU with sufficient sample size for genes with high levels of quantification uncertainty, while also maintaining favorable speed and scalability. We motivate our methods using data from the Cancer Genome Atlas Breast Invasive Carcinoma data set, specifically using RNA-seq data from primary tumors for 740 patients with breast cancer. We show greatly reduced computation time from our new methods as well as the ability to detect several novel genes with significant DTU across different breast cancer subtypes.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"559-576"},"PeriodicalIF":1.8000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017126/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxad008","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Differential transcript usage (DTU) occurs when the relative expression of multiple transcripts arising from the same gene changes between different conditions. Existing approaches to detect DTU often rely on computational procedures that can have speed and scalability issues as the number of samples increases. Here we propose a new method, CompDTU, that uses compositional regression to model the relative abundance proportions of each transcript that are of interest in DTU analyses. This procedure leverages fast matrix-based computations that make it ideally suited for DTU analysis with larger sample sizes. This method also allows for the testing of and adjustment for multiple categorical or continuous covariates. Additionally, many existing approaches for DTU ignore quantification uncertainty in the expression estimates for each transcript in RNA-seq data. We extend our CompDTU method to incorporate quantification uncertainty leveraging common output from RNA-seq expression quantification tool in a novel method CompDTUme. Through several power analyses, we show that CompDTU has excellent sensitivity and reduces false positive results relative to existing methods. Additionally, CompDTUme results in further improvements in performance over CompDTU with sufficient sample size for genes with high levels of quantification uncertainty, while also maintaining favorable speed and scalability. We motivate our methods using data from the Cancer Genome Atlas Breast Invasive Carcinoma data set, specifically using RNA-seq data from primary tumors for 740 patients with breast cancer. We show greatly reduced computation time from our new methods as well as the ability to detect several novel genes with significant DTU across different breast cancer subtypes.

查看原文本刊更多论文

通过成分测量误差回归建模纳入量化不确定性的差异转录本使用分析。

当同一基因产生的多个转录本的相对表达量在不同条件下发生变化时，就会出现转录本使用差异（DTU）。现有的 DTU 检测方法通常依赖于计算程序，随着样本数量的增加，计算速度和可扩展性都会出现问题。在这里，我们提出了一种新方法 CompDTU，它使用成分回归来模拟 DTU 分析中感兴趣的每个转录本的相对丰度比例。该方法利用基于矩阵的快速计算，非常适合较大样本量的 DTU 分析。这种方法还可以测试和调整多个分类或连续协变量。此外，许多现有的 DTU 方法都忽略了 RNA-seq 数据中每个转录本表达估计值的量化不确定性。我们扩展了 CompDTU 方法，利用新方法 CompDTUme 中 RNA-seq 表达定量工具的常见输出，将定量不确定性纳入其中。通过几项功率分析，我们发现 CompDTU 与现有方法相比，灵敏度极高，并能减少假阳性结果。此外，与 CompDTU 相比，CompDTUme 的性能有了进一步提高，对于定量不确定性较高的基因，CompDTUme 有足够的样本量，同时还保持了良好的速度和可扩展性。我们利用癌症基因组图谱乳腺浸润性癌数据集的数据，特别是 740 名乳腺癌患者原发肿瘤的 RNA 序列数据，对我们的方法进行了演示。结果表明，我们的新方法大大缩短了计算时间，并能在不同的乳腺癌亚型中检测出几个具有显著 DTU 的新基因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biostatistics 生物-数学与计算生物学

CiteScore

5.10

自引率

4.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference, epidemiologic statistics, clinical trials, survival analysis, and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods, which in turn have improved our ability to draw valid inferences from data. The objective of Biostatistics is to advance statistical science and its application to problems of human health and disease, with the ultimate goal of advancing the public''s health.