{"title":"Foundation of Calculating Normalized Maximum Likelihood for Continuous Probability Models","authors":"Atsushi Suzuki, Kota Fukuzawa, Kenji Yamanishi","doi":"arxiv-2409.08387","DOIUrl":null,"url":null,"abstract":"The normalized maximum likelihood (NML) code length is widely used as a model\nselection criterion based on the minimum description length principle, where\nthe model with the shortest NML code length is selected. A common method to\ncalculate the NML code length is to use the sum (for a discrete model) or\nintegral (for a continuous model) of a function defined by the distribution of\nthe maximum likelihood estimator. While this method has been proven to\ncorrectly calculate the NML code length of discrete models, no proof has been\nprovided for continuous cases. Consequently, it has remained unclear whether\nthe method can accurately calculate the NML code length of continuous models.\nIn this paper, we solve this problem affirmatively, proving that the method is\nalso correct for continuous cases. Remarkably, completing the proof for\ncontinuous cases is non-trivial in that it cannot be achieved by merely\nreplacing the sums in discrete cases with integrals, as the decomposition trick\napplied to sums in the discrete model case proof is not applicable to integrals\nin the continuous model case proof. To overcome this, we introduce a novel\ndecomposition approach based on the coarea formula from geometric measure\ntheory, which is essential to establishing our proof for continuous cases.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The normalized maximum likelihood (NML) code length is widely used as a model
selection criterion based on the minimum description length principle, where
the model with the shortest NML code length is selected. A common method to
calculate the NML code length is to use the sum (for a discrete model) or
integral (for a continuous model) of a function defined by the distribution of
the maximum likelihood estimator. While this method has been proven to
correctly calculate the NML code length of discrete models, no proof has been
provided for continuous cases. Consequently, it has remained unclear whether
the method can accurately calculate the NML code length of continuous models.
In this paper, we solve this problem affirmatively, proving that the method is
also correct for continuous cases. Remarkably, completing the proof for
continuous cases is non-trivial in that it cannot be achieved by merely
replacing the sums in discrete cases with integrals, as the decomposition trick
applied to sums in the discrete model case proof is not applicable to integrals
in the continuous model case proof. To overcome this, we introduce a novel
decomposition approach based on the coarea formula from geometric measure
theory, which is essential to establishing our proof for continuous cases.