Highest Posterior Model Computation and Variable Selection via Simulated Annealing

The New England Journal of Statistics in Data Science Pub Date : 2023-01-01 DOI:10.51387/23-nejsds40

A. Maity, S. Basu

引用次数: 1

Abstract

Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. A formal way to perform this selection under the Bayesian approach is to select the model with highest posterior probability. The problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model. We propose to carry out this optimization using simulated annealing and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient. Theoretical justifications are provided and applications to high dimensional datasets are discussed. The proposed method is implemented in an R package sahpm for general use and is made available on R CRAN.

查看原文本刊更多论文

基于模拟退火的最高后验模型计算和变量选择

变量选择广泛应用于数据分析的所有应用领域，从大规模微阵列研究中的基因优化选择，到癌症基因组学中靶向治疗的生物标志物的优化选择，再到商业分析中最佳预测因子的选择。在贝叶斯方法下进行这种选择的一种正式方法是选择具有最高后验概率的模型。该问题可以看作是模型空间上的优化问题，其目标函数是模型的后验概率。我们建议使用模拟退火来进行这种优化，并说明了它在高维问题中的可行性。通过各种仿真研究，证明了这种新方法的有效性。给出了理论依据，并讨论了在高维数据集上的应用。所提出的方法在R包sahpm中实现，以供一般使用，并在R CRAN上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The New England Journal of Statistics in Data Science

自引率

0.00%

发文量