A multivariate linear model for investigating the association between gene-module co-expression and a continuous covariate.

IF 0.9 4区数学 Q3 Mathematics

Statistical Applications in Genetics and Molecular Biology Pub Date : 2019-03-15 DOI:10.1515/sagmb-2018-0008

Trishanta Padayachee, Tatsiana Khamiakova, Ziv Shkedy, Perttu Salo, Markus Perola, Tomasz Burzykowski

{"title":"A multivariate linear model for investigating the association between gene-module co-expression and a continuous covariate.","authors":"Trishanta Padayachee, Tatsiana Khamiakova, Ziv Shkedy, Perttu Salo, Markus Perola, Tomasz Burzykowski","doi":"10.1515/sagmb-2018-0008","DOIUrl":null,"url":null,"abstract":"<p><p>A way to enhance our understanding of the development and progression of complex diseases is to investigate the influence of cellular environments on gene co-expression (i.e. gene-pair correlations). Often, changes in gene co-expression are investigated across two or more biological conditions defined by categorizing a continuous covariate. However, the selection of arbitrary cut-off points may have an influence on the results of an analysis. To address this issue, we use a general linear model (GLM) for correlated data to study the relationship between gene-module co-expression and a covariate like metabolite concentration. The GLM specifies the gene-pair correlations as a function of the continuous covariate. The use of the GLM allows for investigating different (linear and non-linear) patterns of co-expression. Furthermore, the modeling approach offers a formal framework for testing hypotheses about possible patterns of co-expression. In our paper, a simulation study is used to assess the performance of the GLM. The performance is compared with that of a previously proposed GLM that utilizes categorized covariates. The versatility of the model is illustrated by using a real-life example. We discuss the theoretical issues related to the construction of the test statistics and the computational challenges related to fitting of the proposed model.</p>","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"18 2","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2018-0008","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2018-0008","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

A way to enhance our understanding of the development and progression of complex diseases is to investigate the influence of cellular environments on gene co-expression (i.e. gene-pair correlations). Often, changes in gene co-expression are investigated across two or more biological conditions defined by categorizing a continuous covariate. However, the selection of arbitrary cut-off points may have an influence on the results of an analysis. To address this issue, we use a general linear model (GLM) for correlated data to study the relationship between gene-module co-expression and a covariate like metabolite concentration. The GLM specifies the gene-pair correlations as a function of the continuous covariate. The use of the GLM allows for investigating different (linear and non-linear) patterns of co-expression. Furthermore, the modeling approach offers a formal framework for testing hypotheses about possible patterns of co-expression. In our paper, a simulation study is used to assess the performance of the GLM. The performance is compared with that of a previously proposed GLM that utilizes categorized covariates. The versatility of the model is illustrated by using a real-life example. We discuss the theoretical issues related to the construction of the test statistics and the computational challenges related to fitting of the proposed model.

查看原文本刊更多论文

一个用于研究基因-模块共表达与连续协变量之间关系的多元线性模型。

研究细胞环境对基因共表达(即基因对相关性)的影响是提高我们对复杂疾病发生和发展的理解的一种方法。通常，通过对连续协变量进行分类，研究基因共表达的变化在两个或多个生物学条件下。然而，任意截断点的选择可能会对分析结果产生影响。为了解决这个问题，我们对相关数据使用一般线性模型(GLM)来研究基因-模块共表达与代谢物浓度等协变量之间的关系。GLM将基因对相关性指定为连续协变量的函数。使用GLM可以研究不同的(线性和非线性)共表达模式。此外，建模方法为测试关于可能的共表达模式的假设提供了一个形式化框架。在本文中，我们使用仿真研究来评估GLM的性能。将性能与先前提出的利用分类协变量的GLM进行比较。通过一个现实生活中的例子说明了该模型的多功能性。我们讨论了与检验统计量的构建相关的理论问题以及与拟合所提出的模型相关的计算挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Applications in Genetics and Molecular Biology 生物-生化与分子生物学

CiteScore

1.20

自引率

11.10%

发文量

审稿时长

6-12 weeks

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.