Introduction and Overview

Insup Lee, Joseph Y.-T. Leung, S. Son
{"title":"Introduction and Overview","authors":"Insup Lee, Joseph Y.-T. Leung, S. Son","doi":"10.1201/9781420011746.ch1","DOIUrl":null,"url":null,"abstract":"How is it that a committee of blockheads can somehow arrive at highly reasoned decisions, despite the weak judgment of the individual members? How can the shaky separate views of a panel of dolts be combined into a single opinion that is very likely to be correct? That this possibility of garnering wisdom from a council of fools can be harnessed and used to advantage may seem far-fetched and implausible, especially in real life. Nevertheless, this unlikely strategy turns out to form the basis of boosting, an approach to machine learning that is the topic of this book. Indeed, at its core, boosting solves hard machine-learning problems by forming a very smart committee of grossly incompetent but carefully selected members. To see how this might work in the context of machine learning, consider the problem of filtering out spam, or junk email. Spam is a modern-day nuisance, and one that is ideally handled by highly accurate filters that can identify and remove spam from the flow of legitimate email. Thus, to build a spam filter, the main problem is to create a method by which a computer can automatically categorize email as spam (junk) or ham (legitimate). The machine learning approach to this problem prescribes that we begin by gathering a collection of examples of the two classes, that is, a collection of email messages which have been labeled, presumably by a human, as spam or ham. The purpose of the machine learning algorithm is to automatically produce from such data a prediction rule that can be used to reliably classify new examples (email messages) as spam or ham. For any of us who has ever been bombarded with spam, rules for identifying spam or ham will immediately come to mind. For instance, if it contains the word Viagra, then it is probably spam. Or, as another example, email from one's spouse is quite likely to be ham. Such individual rules of thumb are far from complete as a means of separating spam from ham. A rule that classifies all email containing Viagra as spam, and all other email as ham, will very often be wrong. On the other hand, such a rule is undoubtedly telling us something useful and nontrivial, and its accuracy, however poor, will nonetheless be significantly better than simply guessing entirely at random as to whether each email is spam or ham. Intuitively, finding these weak …","PeriodicalId":164674,"journal":{"name":"Competition Law and Economics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Competition Law and Economics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/9781420011746.ch1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

How is it that a committee of blockheads can somehow arrive at highly reasoned decisions, despite the weak judgment of the individual members? How can the shaky separate views of a panel of dolts be combined into a single opinion that is very likely to be correct? That this possibility of garnering wisdom from a council of fools can be harnessed and used to advantage may seem far-fetched and implausible, especially in real life. Nevertheless, this unlikely strategy turns out to form the basis of boosting, an approach to machine learning that is the topic of this book. Indeed, at its core, boosting solves hard machine-learning problems by forming a very smart committee of grossly incompetent but carefully selected members. To see how this might work in the context of machine learning, consider the problem of filtering out spam, or junk email. Spam is a modern-day nuisance, and one that is ideally handled by highly accurate filters that can identify and remove spam from the flow of legitimate email. Thus, to build a spam filter, the main problem is to create a method by which a computer can automatically categorize email as spam (junk) or ham (legitimate). The machine learning approach to this problem prescribes that we begin by gathering a collection of examples of the two classes, that is, a collection of email messages which have been labeled, presumably by a human, as spam or ham. The purpose of the machine learning algorithm is to automatically produce from such data a prediction rule that can be used to reliably classify new examples (email messages) as spam or ham. For any of us who has ever been bombarded with spam, rules for identifying spam or ham will immediately come to mind. For instance, if it contains the word Viagra, then it is probably spam. Or, as another example, email from one's spouse is quite likely to be ham. Such individual rules of thumb are far from complete as a means of separating spam from ham. A rule that classifies all email containing Viagra as spam, and all other email as ham, will very often be wrong. On the other hand, such a rule is undoubtedly telling us something useful and nontrivial, and its accuracy, however poor, will nonetheless be significantly better than simply guessing entirely at random as to whether each email is spam or ham. Intuitively, finding these weak …
简介与概述
尽管每个成员的判断力都很弱,但一个由笨蛋组成的委员会是如何做出高度理性的决定的?如何将一群傻瓜的不稳定的不同观点组合成一个很可能是正确的单一观点呢?这种从一群傻瓜中获得智慧的可能性可以被利用和利用,这似乎遥不可及,令人难以置信,尤其是在现实生活中。然而,这种不太可能的策略最终形成了促进的基础,这是本书主题的机器学习方法。实际上,助推的核心是通过组建一个非常聪明的委员会来解决机器学习的难题,这些委员会的成员都非常不称职,但都是经过精心挑选的。要了解这在机器学习的背景下如何工作,请考虑过滤垃圾邮件或垃圾邮件的问题。垃圾邮件是一个现代的麻烦,理想情况下,它是由高度精确的过滤器来处理的,它可以识别并从合法的电子邮件流中删除垃圾邮件。因此,要构建垃圾邮件过滤器,主要问题是创建一种方法,通过该方法,计算机可以自动将电子邮件分类为spam(垃圾)或ham(合法)。解决这个问题的机器学习方法规定,我们首先收集这两类的示例集合,也就是说,一组被标记为垃圾邮件或火腿的电子邮件消息,可能是由人类标记的。机器学习算法的目的是从这些数据中自动生成一个预测规则,该规则可用于可靠地将新示例(电子邮件消息)分类为垃圾邮件或火腿。对于我们中的任何一个曾经被垃圾邮件轰炸过的人来说,识别垃圾邮件或火腿的规则会立即出现在脑海中。例如,如果它包含“伟哥”这个词,那么它可能是垃圾邮件。或者,作为另一个例子,来自配偶的电子邮件很可能是火腿。作为区分垃圾邮件和火腿的一种手段,这些个人的经验法则还远远不够完善。将所有含有伟哥的邮件归类为垃圾邮件,将所有其他邮件归类为垃圾邮件的规则往往是错误的。另一方面,这样的规则无疑告诉了我们一些有用的和重要的东西,而且它的准确性,无论多么差,仍然比仅仅随机猜测每封邮件是垃圾邮件还是火腿要好得多。直觉上,发现这些弱的…
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信