How to use expert advice

Proceedings of the twenty-fifth annual ACM symposium on Theory of Computing Pub Date : 1993-06-01 DOI:10.1145/167088.167198

N. Cesa-Bianchi, Y. Freund, D. Helmbold, D. Haussler, R. Schapire, Manfred K. Warmuth

{"title":"How to use expert advice","authors":"N. Cesa-Bianchi, Y. Freund, D. Helmbold, D. Haussler, R. Schapire, Manfred K. Warmuth","doi":"10.1145/167088.167198","DOIUrl":null,"url":null,"abstract":"We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts''. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also extend our analysis to the case in which log loss is used instead of the expected number of mistakes.","PeriodicalId":280602,"journal":{"name":"Proceedings of the twenty-fifth annual ACM symposium on Theory of Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"656","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the twenty-fifth annual ACM symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/167088.167198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 656

Abstract

We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called `experts''. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also extend our analysis to the case in which log loss is used instead of the expected number of mistakes.

查看原文本刊更多论文

如何使用专家建议

我们分析通过结合几种预测策略(称为“专家”)的预测来预测二值的算法。我们的分析是针对最坏的情况，也就是说，我们对要预测的比特序列的生成方式不做任何假设。我们通过它在比特序列上所犯的期望错误数与最好的专家在该序列上所犯的期望错误数之间的差来衡量算法的性能，其中期望是相对于预测中的随机化的。我们表明，最小可实现的差异是在最好的专家的错误数量的平方根的数量级上，并且我们给出了实现这一目标的有效算法。在大多数情况下，上界和下界的前导常数是匹配的。然后，我们展示了这如何导致某些具有性能界限的模式识别/学习算法，这些算法可以改进当前已知的最佳结果。我们还将分析扩展到使用日志损失而不是预期错误数量的情况。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the twenty-fifth annual ACM symposium on Theory of Computing

自引率

0.00%

发文量