利用机器学习分类器并行化 MCMC 及其基于库尔贝-莱布勒发散的标准

Tomoki Matsumoto, Yuichiro Kanazawa
{"title":"利用机器学习分类器并行化 MCMC 及其基于库尔贝-莱布勒发散的标准","authors":"Tomoki Matsumoto, Yuichiro Kanazawa","doi":"arxiv-2406.11246","DOIUrl":null,"url":null,"abstract":"In the era of Big Data, analyzing high-dimensional and large datasets\npresents significant computational challenges. Although Bayesian statistics is\nwell-suited for these complex data structures, Markov chain Monte Carlo (MCMC)\nmethod, which are essential for Bayesian estimation, suffers from computation\ncost because of its sequential nature. For faster and more effective\ncomputation, this paper introduces an algorithm to enhance a parallelizing MCMC\nmethod to handle this computation problem. We highlight the critical role of\nthe overlapped area of posterior distributions after data partitioning, and\npropose a method using a machine learning classifier to effectively identify\nand extract MCMC draws from the area to approximate the actual posterior\ndistribution. Our main contribution is the development of a Kullback-Leibler\n(KL) divergence-based criterion that simplifies hyperparameter tuning in\ntraining a classifier and makes the method nearly hyperparameter-free.\nSimulation studies validate the efficacy of our proposed methods.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"173 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence\",\"authors\":\"Tomoki Matsumoto, Yuichiro Kanazawa\",\"doi\":\"arxiv-2406.11246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of Big Data, analyzing high-dimensional and large datasets\\npresents significant computational challenges. Although Bayesian statistics is\\nwell-suited for these complex data structures, Markov chain Monte Carlo (MCMC)\\nmethod, which are essential for Bayesian estimation, suffers from computation\\ncost because of its sequential nature. For faster and more effective\\ncomputation, this paper introduces an algorithm to enhance a parallelizing MCMC\\nmethod to handle this computation problem. We highlight the critical role of\\nthe overlapped area of posterior distributions after data partitioning, and\\npropose a method using a machine learning classifier to effectively identify\\nand extract MCMC draws from the area to approximate the actual posterior\\ndistribution. Our main contribution is the development of a Kullback-Leibler\\n(KL) divergence-based criterion that simplifies hyperparameter tuning in\\ntraining a classifier and makes the method nearly hyperparameter-free.\\nSimulation studies validate the efficacy of our proposed methods.\",\"PeriodicalId\":501215,\"journal\":{\"name\":\"arXiv - STAT - Computation\",\"volume\":\"173 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.11246\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.11246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在大数据时代,分析高维和大型数据集给计算带来了巨大挑战。虽然贝叶斯统计法非常适合这些复杂的数据结构,但作为贝叶斯估计必不可少的马尔科夫链蒙特卡洛(MCMC)方法却因其顺序性而受到计算成本的困扰。为了实现更快、更有效的计算,本文介绍了一种增强并行化 MCMC 方法的算法,以解决这一计算问题。我们强调了数据分割后后验分布重叠区域的关键作用,并提出了一种使用机器学习分类器的方法,以有效识别和提取该区域的 MCMC 抽样,从而逼近实际的后验分布。我们的主要贡献是开发了基于库尔贝-莱布勒(KL)发散的准则,简化了分类器中超参数的调整,使该方法几乎不需要超参数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence
In the era of Big Data, analyzing high-dimensional and large datasets presents significant computational challenges. Although Bayesian statistics is well-suited for these complex data structures, Markov chain Monte Carlo (MCMC) method, which are essential for Bayesian estimation, suffers from computation cost because of its sequential nature. For faster and more effective computation, this paper introduces an algorithm to enhance a parallelizing MCMC method to handle this computation problem. We highlight the critical role of the overlapped area of posterior distributions after data partitioning, and propose a method using a machine learning classifier to effectively identify and extract MCMC draws from the area to approximate the actual posterior distribution. Our main contribution is the development of a Kullback-Leibler (KL) divergence-based criterion that simplifies hyperparameter tuning in training a classifier and makes the method nearly hyperparameter-free. Simulation studies validate the efficacy of our proposed methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信