{"title":"Entropy Estimation: Simulation, Theory and a Case Study","authors":"Ioannis Kontoyiannis","doi":"10.1109/ITW.2006.1633823","DOIUrl":null,"url":null,"abstract":"We consider the statistical problem of estimating the entropy of finite-alphabet data generated from an unknown stationary process. We examine a series of estimators, including: (1) The standard maximum-likelihood or \"plug-in\" estimator; (2) Four different estimators based on the family of Lempel-Ziv compression algorithms; (3) A different plug-in estimator especially tailored to renewal processes; and (4) The natural estimator derived from the Context-Tree Weighting method (CTW). Some of these estimators are well-known, and some are new. We first summarize numerous theoretical properties of these estimators: Conditions for consistency, estimates of their bias and variance, methods for approximating the estimation error and for obtaining confidence intervals. Several new theoretical results are developed. We show how the theory offers preliminary indications results offer guidelines for tuning the parameters involved in the estimation process. Then we present an extensive simulation study on various types of synthetic data and under various conditions. We compare their performance and comment on the strengths and weaknesses of the various methods. For each estimator, we develop a precise method for calculating the estimation error based on any specific data set. Finally we report the performance of these entropy estimators on the (binary) spike trains of 28 neurons recorded simultaneously for a one-hour period from the primary motor and dorsal premotor cortices of a quietly seated monkey not engaged in a task behavior. Based on joint work with Yun Gao and Elie Bienenstock.","PeriodicalId":293144,"journal":{"name":"2006 IEEE Information Theory Workshop - ITW '06 Punta del Este","volume":"26 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE Information Theory Workshop - ITW '06 Punta del Este","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITW.2006.1633823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We consider the statistical problem of estimating the entropy of finite-alphabet data generated from an unknown stationary process. We examine a series of estimators, including: (1) The standard maximum-likelihood or "plug-in" estimator; (2) Four different estimators based on the family of Lempel-Ziv compression algorithms; (3) A different plug-in estimator especially tailored to renewal processes; and (4) The natural estimator derived from the Context-Tree Weighting method (CTW). Some of these estimators are well-known, and some are new. We first summarize numerous theoretical properties of these estimators: Conditions for consistency, estimates of their bias and variance, methods for approximating the estimation error and for obtaining confidence intervals. Several new theoretical results are developed. We show how the theory offers preliminary indications results offer guidelines for tuning the parameters involved in the estimation process. Then we present an extensive simulation study on various types of synthetic data and under various conditions. We compare their performance and comment on the strengths and weaknesses of the various methods. For each estimator, we develop a precise method for calculating the estimation error based on any specific data set. Finally we report the performance of these entropy estimators on the (binary) spike trains of 28 neurons recorded simultaneously for a one-hour period from the primary motor and dorsal premotor cortices of a quietly seated monkey not engaged in a task behavior. Based on joint work with Yun Gao and Elie Bienenstock.