Estimating the Model Order in Exponential Families

IEEE/CAM Information Theory Workshop at Cornell Pub Date : 1989-06-25 DOI:10.1109/ITW.1989.761421

N. Merhav

{"title":"Estimating the Model Order in Exponential Families","authors":"N. Merhav","doi":"10.1109/ITW.1989.761421","DOIUrl":null,"url":null,"abstract":"The problem of estimating the model order of a statistical model has been widely studied in the literature of time series analysis, information theory and automatic control. Most of the known order estimation schemes (AIC, BIC, CAT, FPE, MDL, etc.), although based on reasonable ideas, are heuristic in the sense that no particular risk function (involving the true order and its estimate) is optimized. Rather, these methods are derived from various extensions of the maximum likelihood principle. In this talk, a new approach to the model order estimation problem is presented: Estimators are sought which accomplish higher exponential rate of decrease in the underestimation probability, while keeping the exponential rate of the overestimation probability at a certain prescribed level. This criterion, which is an extension to the Neyman-Pearson criterion, enables to control between overestimation and underestimation probabilities, in a way that is easy and well understood. For the class of statistical models from the exponential family, an order estimator is suggested and shown to be optimal in the above defined sense, that is, it provides the best tradeoff between the asymptotic exponential rates of overestimation and underestimation probabilities. The suggested method is strongly related to the gene-realized likelihood ratio test (GLRT), which is widely used for composite hypothesis testing problems. Several examples of specific models from the exponential family are given: The Gaussian linear regression model, the Gaussian autoregressive model, and the finite alphabet Markov model. It is also demonstrated that several well known composite hypothesis testing problems can be formalized in the model order estimation framework and then solved as special cases. The results generalize to models where there are more than one order to estimate (e.g. ARMA(p,q) model). It is demonstrated that the computation time is significantly smaller than those of other model order estimation schemes. Another direction of extending the results is that of estimating the number of states of a general finite-state source, which not necessarily belongs to the exponential family. An interesting relation between the proposed scheme and universal data compression schemes will be pointed out: It can be shown that efficient data compression algorithms can be used as tools for efficient order estimation in the above described approach.","PeriodicalId":413028,"journal":{"name":"IEEE/CAM Information Theory Workshop at Cornell","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/CAM Information Theory Workshop at Cornell","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITW.1989.761421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The problem of estimating the model order of a statistical model has been widely studied in the literature of time series analysis, information theory and automatic control. Most of the known order estimation schemes (AIC, BIC, CAT, FPE, MDL, etc.), although based on reasonable ideas, are heuristic in the sense that no particular risk function (involving the true order and its estimate) is optimized. Rather, these methods are derived from various extensions of the maximum likelihood principle. In this talk, a new approach to the model order estimation problem is presented: Estimators are sought which accomplish higher exponential rate of decrease in the underestimation probability, while keeping the exponential rate of the overestimation probability at a certain prescribed level. This criterion, which is an extension to the Neyman-Pearson criterion, enables to control between overestimation and underestimation probabilities, in a way that is easy and well understood. For the class of statistical models from the exponential family, an order estimator is suggested and shown to be optimal in the above defined sense, that is, it provides the best tradeoff between the asymptotic exponential rates of overestimation and underestimation probabilities. The suggested method is strongly related to the gene-realized likelihood ratio test (GLRT), which is widely used for composite hypothesis testing problems. Several examples of specific models from the exponential family are given: The Gaussian linear regression model, the Gaussian autoregressive model, and the finite alphabet Markov model. It is also demonstrated that several well known composite hypothesis testing problems can be formalized in the model order estimation framework and then solved as special cases. The results generalize to models where there are more than one order to estimate (e.g. ARMA(p,q) model). It is demonstrated that the computation time is significantly smaller than those of other model order estimation schemes. Another direction of extending the results is that of estimating the number of states of a general finite-state source, which not necessarily belongs to the exponential family. An interesting relation between the proposed scheme and universal data compression schemes will be pointed out: It can be shown that efficient data compression algorithms can be used as tools for efficient order estimation in the above described approach.

查看原文本刊更多论文

指数族中模型阶的估计

统计模型阶数的估计问题在时间序列分析、信息论和自动控制等领域得到了广泛的研究。大多数已知的阶数估计方案(AIC, BIC, CAT, FPE, MDL等)虽然基于合理的思想，但在没有优化特定风险函数(涉及真阶数及其估计)的意义上是启发式的。相反，这些方法是从极大似然原理的各种扩展中派生出来的。本文提出了一种模型阶数估计问题的新方法，即寻求能使过估计概率的指数率保持在某一规定水平上，而低估概率的指数率又达到较高的指数下降率的估计量。这个标准是内曼-皮尔逊标准的扩展，它能够以一种简单易懂的方式控制高估和低估概率。对于指数族统计模型，提出了一个阶估计量，并证明它在上述定义意义上是最优的，即它提供了高估概率和低估概率的渐近指数率之间的最佳权衡。该方法与基因实现似然比检验(GLRT)密切相关，后者广泛用于复合假设检验问题。给出了指数族模型的几个具体例子:高斯线性回归模型、高斯自回归模型和有限字母马尔可夫模型。本文还证明了几个众所周知的复合假设检验问题可以在模型阶数估计框架中形式化，然后作为特殊情况求解。结果推广到有多个阶需要估计的模型(例如ARMA(p,q)模型)。结果表明，该方法的计算时间明显小于其他模型阶数估计方法。推广结果的另一个方向是估计一般有限状态源的状态数，它不一定属于指数族。将指出所提出的方案与通用数据压缩方案之间的有趣关系:可以证明，在上述方法中，有效的数据压缩算法可以用作有效阶估计的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE/CAM Information Theory Workshop at Cornell

自引率

0.00%

发文量