Facilitating the Calculation of the Efficient Score Using Symbolic Computing.

IF 1.8 4区数学 Q1 STATISTICS & PROBABILITY

American Statistician Pub Date : 2018-01-01 Epub Date: 2017-10-30 DOI:10.1080/00031305.2017.1392361

Alexander Sibley, Zhiguo Li, Yu Jiang, Yi-Ju Li, Cliburn Chan, Andrew Allen, Kouros Owzar

{"title":"Facilitating the Calculation of the Efficient Score Using Symbolic Computing.","authors":"Alexander Sibley, Zhiguo Li, Yu Jiang, Yi-Ju Li, Cliburn Chan, Andrew Allen, Kouros Owzar","doi":"10.1080/00031305.2017.1392361","DOIUrl":null,"url":null,"abstract":"<p><p>The score statistic continues to be a fundamental tool for statistical inference. In the analysis of data from high-throughput genomic assays, inference on the basis of the score usually enjoys greater stability, considerably higher computational efficiency, and lends itself more readily to the use of resampling methods than the asymptotically equivalent Wald or likelihood ratio tests. The score function often depends on a set of unknown nuisance parameters which have to be replaced by estimators, but can be improved by calculating the efficient score, which accounts for the variability induced by estimating these parameters. Manual derivation of the efficient score is tedious and error-prone, so we illustrate using computer algebra to facilitate this derivation. We demonstrate this process within the context of a standard example from genetic association analyses, though the techniques shown here could be applied to any derivation, and have a place in the toolbox of any modern statistician. We further show how the resulting symbolic expressions can be readily ported to compiled languages, to develop fast numerical algorithms for high-throughput genomic analysis. We conclude by considering extensions of this approach. The code featured in this report is available online as part of the supplementary material.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"72 2","pages":"199-205"},"PeriodicalIF":1.8000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2017.1392361","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Statistician","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/00031305.2017.1392361","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/10/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 1

Abstract

The score statistic continues to be a fundamental tool for statistical inference. In the analysis of data from high-throughput genomic assays, inference on the basis of the score usually enjoys greater stability, considerably higher computational efficiency, and lends itself more readily to the use of resampling methods than the asymptotically equivalent Wald or likelihood ratio tests. The score function often depends on a set of unknown nuisance parameters which have to be replaced by estimators, but can be improved by calculating the efficient score, which accounts for the variability induced by estimating these parameters. Manual derivation of the efficient score is tedious and error-prone, so we illustrate using computer algebra to facilitate this derivation. We demonstrate this process within the context of a standard example from genetic association analyses, though the techniques shown here could be applied to any derivation, and have a place in the toolbox of any modern statistician. We further show how the resulting symbolic expressions can be readily ported to compiled languages, to develop fast numerical algorithms for high-throughput genomic analysis. We conclude by considering extensions of this approach. The code featured in this report is available online as part of the supplementary material.

查看原文本刊更多论文

利用符号计算促进有效分数的计算。

分数统计仍然是统计推断的基本工具。在高通量基因组分析的数据分析中，基于分数的推断通常具有更大的稳定性，相当高的计算效率，并且比渐近等效Wald或似然比检验更容易使用重采样方法。分数函数通常依赖于一组未知的干扰参数，这些参数必须由估计器替换，但可以通过计算有效分数来改进，这解释了由估计这些参数引起的可变性。手动推导有效分数是繁琐且容易出错的，因此我们说明使用计算机代数来简化这种推导。我们在遗传关联分析的标准示例的背景下演示这个过程，尽管这里展示的技术可以应用于任何推导，并在任何现代统计学家的工具箱中占有一席之地。我们进一步展示了如何将结果符号表达式轻松地移植到编译语言中，以开发用于高通量基因组分析的快速数值算法。最后，我们将考虑这种方法的扩展。本报告中的代码作为补充材料的一部分可在网上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

American Statistician 数学-统计学与概率论

CiteScore

3.50

自引率

5.60%

发文量

审稿时长

>12 weeks

期刊介绍： Are you looking for general-interest articles about current national and international statistical problems and programs; interesting and fun articles of a general nature about statistics and its applications; or the teaching of statistics? Then you are looking for The American Statistician (TAS), published quarterly by the American Statistical Association. TAS contains timely articles organized into the following sections: Statistical Practice, General, Teacher''s Corner, History Corner, Interdisciplinary, Statistical Computing and Graphics, Reviews of Books and Teaching Materials, and Letters to the Editor.