Bayesian MMSE estimation of classification error and performance on real genomic data

Lori A. Dalton, E. Dougherty
{"title":"Bayesian MMSE estimation of classification error and performance on real genomic data","authors":"Lori A. Dalton, E. Dougherty","doi":"10.1109/GENSIPS.2010.5719674","DOIUrl":null,"url":null,"abstract":"Small sample classifier design has become a major issue in the biological and medical communities, owing to the recent development of high-throughput genomic and proteomic technologies. And as the problem of estimating classifier error is already handicapped by limited available information, it is further compounded by the necessity of reusing training-data for error estimation. Due to the difficulty of error estimation, all currently popular techniques have been heuristically devised, rather than rigorously designed based on statistical inference and optimization. However, a recently proposed error estimator has placed the problem into an optimal mean-square error (MSE) signal estimation framework in the presence of uncertainty. This results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). Closed form solutions have been provided for two important examples: the discrete classification problem and linear classification of Gaussian distributions. Here we discuss the Bayesian minimum mean-square error (MMSE) error estimator and demonstrate performance on real biological data under Gaussian modeling assumptions.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2010.5719674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Small sample classifier design has become a major issue in the biological and medical communities, owing to the recent development of high-throughput genomic and proteomic technologies. And as the problem of estimating classifier error is already handicapped by limited available information, it is further compounded by the necessity of reusing training-data for error estimation. Due to the difficulty of error estimation, all currently popular techniques have been heuristically devised, rather than rigorously designed based on statistical inference and optimization. However, a recently proposed error estimator has placed the problem into an optimal mean-square error (MSE) signal estimation framework in the presence of uncertainty. This results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). Closed form solutions have been provided for two important examples: the discrete classification problem and linear classification of Gaussian distributions. Here we discuss the Bayesian minimum mean-square error (MMSE) error estimator and demonstrate performance on real biological data under Gaussian modeling assumptions.
贝叶斯MMSE估计在真实基因组数据上的分类误差及性能
由于高通量基因组学和蛋白质组学技术的发展,小样本分类器设计已经成为生物和医学界的一个主要问题。由于可用信息有限,分类器误差估计的问题已经受到限制,而重用训练数据进行误差估计的必要性进一步加剧了这一问题。由于误差估计的困难,目前流行的所有技术都是启发式设计,而不是基于统计推断和优化的严格设计。然而,最近提出的误差估计器将问题置于存在不确定性的最优均方误差(MSE)信号估计框架中。这就产生了基于参数化特征标签分布的贝叶斯误差估计方法。这些贝叶斯误差估计器在给定分布族上平均时是最优的,在给定分布族和所有样本上平均时是无偏的,并且在分析上解决了鲁棒性(建模假设)和准确性(最小均方误差)之间的权衡。对于两个重要的例子:离散分类问题和高斯分布的线性分类问题,已经给出了封闭形式的解。本文讨论了贝叶斯最小均方误差(MMSE)误差估计器,并在高斯建模假设下演示了其在真实生物数据上的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信