回复“微阵列时间序列实验中的因果关系和路径搜索问题评论”

N. Mukhopadhyay, Snigdhansu Chatterjee
{"title":"回复“微阵列时间序列实验中的因果关系和路径搜索问题评论”","authors":"N. Mukhopadhyay, Snigdhansu Chatterjee","doi":"10.1093/bioinformatics/btn019","DOIUrl":null,"url":null,"abstract":"We thank Professors Nagarajan and Upreti for their interest in our paper, Mukhopadhyay and Chatterjee (2007). There, we propose using Granger causality-based pathway detection in an acyclic, homoscedastic framework for microarray time-series expressions; which are generally short-duration time series involving very large number of genes. Professors Nagarajan and Upreti point out that in the presence of heteroscedasticity, and a cycle like ‘gene x regulates the expression of gene y and simultaneously gene y regulates the expression of gene x’, Granger causality tests may not be informative. Here, we adopt the term ‘heteroscedasticity’ (‘homoscedasticity’) to mean the unconditional variance of the white noise, represented as a bivariate vector in the Euclidean co-ordinate system, is different (same) in different co-ordinate directions. Thus, in essence, if the assumptions about the acyclic and homoscedastic nature of the time series are violated, tests for causality detection may fail. This is an important point, since when a contemporaneous cyclic relationship is present, the notion of causality makes little sense. In the context of economics, Eichler (2007) present a treatment of contemporaneous correlation as well as Granger causality. Extreme heteroscedasticity may be indicative of improper normalization of gene expressions. At the end of their letter, Dr Nagarajan and Dr Upreti mention the normalization step. Proper normalization should remove wide discrepancy in noise variance, hence nowadays microarray datasets are typically available in de facto normalized version. The data used in Mukhopadhyay and Chatterjee (2007) is also normalized. However, difference in technical variance, as indicated by Professors Nagarajan and Upreti, may still be present. And that will violate the assumption of our method (as well as many other statistical comparison methods relying on common unknown variance). Professor Nagarajan, in review, kindly suggested references for two-gene systems whose time-profile may not fit into to a homoscedastic, cause-effect framework. Thus, a full vector autoregression structure may be needed to capture their mutual dependence at various lags (including lag zero). It can be guessed that multi-gene systems exist whose temporal codependency nature is extremely complex. Although current knowledge about gene regulatory networks is limited, some biology experts we consulted believe that cyclical patterns may be found in large multi-gene networks as a part of a feedback procedure, if they are studied over long enough time spans. A proper approach to elicit such patterns would be to conduct multivariate, possibly non-stationary, time-series analysis with all the genes over a long time horizon. This is not feasible currently, since present state-of-the-art microarray time series experiments are of short duration and typically involve very large number of genes. Hence, restricting the network to acyclic ones is, in our opinion, a small price to pay to produce informative analysis. Future microarray experiments over longer duration, along with discoveries of biological and chemical properties relating to gene and protein interactions, will no doubt lead to better understanding of gene networks. We would like to point out in Model 1 (Equation 2), !12, !21, \"2 \" and \" 2 # need to be known constants for the mathematical displays (4)–(7) to hold. As they stand, displays (4)–(7) are missing the O (n!1) terms with each estimated parameters if some (or all) of f!12; !21; \" \" ; \" #g are estimated from data, where n is the length of the time series data. Also, the equation for s1 does not account for the fact that as univariate time series, both xt and yt are AR(2) (autoregressive of order 2) process and not AR(1). Similar comments hold for Model 2 (Equation 11). The difficulty of modeling microarray time-series can be appreciated from the fact that in the human cell cycle data considered in Mukhopadhyay and Chatterjee (2007), n was 12 in one experiment, while the time-series itself was 802 dimensional.","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Reply to \\\"Comment on causality and pathway search in microarray time series experiment\\\"\",\"authors\":\"N. Mukhopadhyay, Snigdhansu Chatterjee\",\"doi\":\"10.1093/bioinformatics/btn019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We thank Professors Nagarajan and Upreti for their interest in our paper, Mukhopadhyay and Chatterjee (2007). There, we propose using Granger causality-based pathway detection in an acyclic, homoscedastic framework for microarray time-series expressions; which are generally short-duration time series involving very large number of genes. Professors Nagarajan and Upreti point out that in the presence of heteroscedasticity, and a cycle like ‘gene x regulates the expression of gene y and simultaneously gene y regulates the expression of gene x’, Granger causality tests may not be informative. Here, we adopt the term ‘heteroscedasticity’ (‘homoscedasticity’) to mean the unconditional variance of the white noise, represented as a bivariate vector in the Euclidean co-ordinate system, is different (same) in different co-ordinate directions. Thus, in essence, if the assumptions about the acyclic and homoscedastic nature of the time series are violated, tests for causality detection may fail. This is an important point, since when a contemporaneous cyclic relationship is present, the notion of causality makes little sense. In the context of economics, Eichler (2007) present a treatment of contemporaneous correlation as well as Granger causality. Extreme heteroscedasticity may be indicative of improper normalization of gene expressions. At the end of their letter, Dr Nagarajan and Dr Upreti mention the normalization step. Proper normalization should remove wide discrepancy in noise variance, hence nowadays microarray datasets are typically available in de facto normalized version. The data used in Mukhopadhyay and Chatterjee (2007) is also normalized. However, difference in technical variance, as indicated by Professors Nagarajan and Upreti, may still be present. And that will violate the assumption of our method (as well as many other statistical comparison methods relying on common unknown variance). Professor Nagarajan, in review, kindly suggested references for two-gene systems whose time-profile may not fit into to a homoscedastic, cause-effect framework. Thus, a full vector autoregression structure may be needed to capture their mutual dependence at various lags (including lag zero). It can be guessed that multi-gene systems exist whose temporal codependency nature is extremely complex. Although current knowledge about gene regulatory networks is limited, some biology experts we consulted believe that cyclical patterns may be found in large multi-gene networks as a part of a feedback procedure, if they are studied over long enough time spans. A proper approach to elicit such patterns would be to conduct multivariate, possibly non-stationary, time-series analysis with all the genes over a long time horizon. This is not feasible currently, since present state-of-the-art microarray time series experiments are of short duration and typically involve very large number of genes. Hence, restricting the network to acyclic ones is, in our opinion, a small price to pay to produce informative analysis. Future microarray experiments over longer duration, along with discoveries of biological and chemical properties relating to gene and protein interactions, will no doubt lead to better understanding of gene networks. We would like to point out in Model 1 (Equation 2), !12, !21, \\\"2 \\\" and \\\" 2 # need to be known constants for the mathematical displays (4)–(7) to hold. As they stand, displays (4)–(7) are missing the O (n!1) terms with each estimated parameters if some (or all) of f!12; !21; \\\" \\\" ; \\\" #g are estimated from data, where n is the length of the time series data. Also, the equation for s1 does not account for the fact that as univariate time series, both xt and yt are AR(2) (autoregressive of order 2) process and not AR(1). Similar comments hold for Model 2 (Equation 11). The difficulty of modeling microarray time-series can be appreciated from the fact that in the human cell cycle data considered in Mukhopadhyay and Chatterjee (2007), n was 12 in one experiment, while the time-series itself was 802 dimensional.\",\"PeriodicalId\":90576,\"journal\":{\"name\":\"Journal of bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btn019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btn019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

我们感谢Nagarajan和Upreti教授对我们的论文Mukhopadhyay and Chatterjee(2007)的兴趣。在那里,我们建议在微阵列时间序列表达的非循环、均方差框架中使用基于格兰杰因果关系的途径检测;通常是短时间序列涉及大量的基因。Nagarajan教授和Upreti教授指出,在异方差存在的情况下,像“基因x调节基因y的表达,同时基因y调节基因x的表达”这样的循环,格兰杰因果关系检验可能无法提供信息。在这里,我们采用术语“异方差”(“同方差”)来表示白噪声的无条件方差,在欧几里得坐标系中以二元向量表示,在不同的坐标方向上是不同的(相同的)。因此,从本质上讲,如果违反了关于时间序列的非循环和均方差性质的假设,则因果关系检测的检验可能会失败。这一点很重要,因为当同一时期的循环关系存在时,因果关系的概念就没有什么意义了。在经济学的背景下,Eichler(2007)提出了对同期相关性和格兰杰因果关系的处理。极端的异方差可能表明基因表达不正常。在信的最后,Nagarajan博士和Upreti博士提到了正常化步骤。适当的归一化应该消除噪声方差的广泛差异,因此现在的微阵列数据集通常是事实上的归一化版本。Mukhopadhyay和Chatterjee(2007)中使用的数据也被归一化。然而,正如Nagarajan教授和Upreti教授所指出的那样,技术差异可能仍然存在。这将违反我们方法的假设(以及许多其他依赖于共同未知方差的统计比较方法)。Nagarajan教授在评论中友好地提出了双基因系统的参考文献,这些系统的时间特征可能不适合均方差的因果框架。因此,可能需要一个全向量自回归结构来捕捉它们在各种滞后(包括滞后零)时的相互依赖性。由此可以推测,存在多基因系统,其时间相互依赖的性质是极其复杂的。虽然目前关于基因调控网络的知识有限,但我们咨询的一些生物学专家认为,如果对它们进行足够长的时间跨度的研究,可能会在大型多基因网络中发现周期性模式,作为反馈过程的一部分。引出这种模式的适当方法是在很长一段时间内对所有基因进行多变量(可能是非平稳的)时间序列分析。目前这是不可行的,因为目前最先进的微阵列时间序列实验持续时间短,通常涉及非常大量的基因。因此,在我们看来,将网络限制为非循环网络是产生信息分析的一个小代价。未来更长时间的微阵列实验,以及与基因和蛋白质相互作用有关的生物和化学特性的发现,无疑将有助于更好地理解基因网络。我们想指出的是,在模型1(方程2)中,12、21、2和2 #需要是已知的常数,以保持数学显示(4)-(7)。如果f!12的某些(或全部)项,则显示(4)-(7)缺少每个估计参数的O (n!1)项;! 21;";#g是从数据中估计出来的,其中n是时间序列数据的长度。此外,s1的方程并没有说明这样一个事实,即作为单变量时间序列,xt和yt都是AR(2)(2阶自回归)过程,而不是AR(1)。类似的评论也适用于模型2(公式11)。在Mukhopadhyay和Chatterjee(2007)中考虑的人类细胞周期数据中,一个实验中的n为12,而时间序列本身为802维,这可以看出微阵列时间序列建模的难度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reply to "Comment on causality and pathway search in microarray time series experiment"
We thank Professors Nagarajan and Upreti for their interest in our paper, Mukhopadhyay and Chatterjee (2007). There, we propose using Granger causality-based pathway detection in an acyclic, homoscedastic framework for microarray time-series expressions; which are generally short-duration time series involving very large number of genes. Professors Nagarajan and Upreti point out that in the presence of heteroscedasticity, and a cycle like ‘gene x regulates the expression of gene y and simultaneously gene y regulates the expression of gene x’, Granger causality tests may not be informative. Here, we adopt the term ‘heteroscedasticity’ (‘homoscedasticity’) to mean the unconditional variance of the white noise, represented as a bivariate vector in the Euclidean co-ordinate system, is different (same) in different co-ordinate directions. Thus, in essence, if the assumptions about the acyclic and homoscedastic nature of the time series are violated, tests for causality detection may fail. This is an important point, since when a contemporaneous cyclic relationship is present, the notion of causality makes little sense. In the context of economics, Eichler (2007) present a treatment of contemporaneous correlation as well as Granger causality. Extreme heteroscedasticity may be indicative of improper normalization of gene expressions. At the end of their letter, Dr Nagarajan and Dr Upreti mention the normalization step. Proper normalization should remove wide discrepancy in noise variance, hence nowadays microarray datasets are typically available in de facto normalized version. The data used in Mukhopadhyay and Chatterjee (2007) is also normalized. However, difference in technical variance, as indicated by Professors Nagarajan and Upreti, may still be present. And that will violate the assumption of our method (as well as many other statistical comparison methods relying on common unknown variance). Professor Nagarajan, in review, kindly suggested references for two-gene systems whose time-profile may not fit into to a homoscedastic, cause-effect framework. Thus, a full vector autoregression structure may be needed to capture their mutual dependence at various lags (including lag zero). It can be guessed that multi-gene systems exist whose temporal codependency nature is extremely complex. Although current knowledge about gene regulatory networks is limited, some biology experts we consulted believe that cyclical patterns may be found in large multi-gene networks as a part of a feedback procedure, if they are studied over long enough time spans. A proper approach to elicit such patterns would be to conduct multivariate, possibly non-stationary, time-series analysis with all the genes over a long time horizon. This is not feasible currently, since present state-of-the-art microarray time series experiments are of short duration and typically involve very large number of genes. Hence, restricting the network to acyclic ones is, in our opinion, a small price to pay to produce informative analysis. Future microarray experiments over longer duration, along with discoveries of biological and chemical properties relating to gene and protein interactions, will no doubt lead to better understanding of gene networks. We would like to point out in Model 1 (Equation 2), !12, !21, "2 " and " 2 # need to be known constants for the mathematical displays (4)–(7) to hold. As they stand, displays (4)–(7) are missing the O (n!1) terms with each estimated parameters if some (or all) of f!12; !21; " " ; " #g are estimated from data, where n is the length of the time series data. Also, the equation for s1 does not account for the fact that as univariate time series, both xt and yt are AR(2) (autoregressive of order 2) process and not AR(1). Similar comments hold for Model 2 (Equation 11). The difficulty of modeling microarray time-series can be appreciated from the fact that in the human cell cycle data considered in Mukhopadhyay and Chatterjee (2007), n was 12 in one experiment, while the time-series itself was 802 dimensional.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信