Groups of experts often differ in their decisions: What are the implications for AI and machine learning? A commentary on Noise: A Flaw in Human Judgment, by Kahneman, Sibony, and Sunstein (2021)

IF 2.5 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Ai Magazine Pub Date : 2023-10-26 DOI:10.1002/aaai.12135

Derek H. Sleeman, Ken Gilhooly

{"title":"Groups of experts often differ in their decisions: What are the implications for AI and machine learning? A commentary on Noise: A Flaw in Human Judgment, by Kahneman, Sibony, and Sunstein (2021)","authors":"Derek H. Sleeman, Ken Gilhooly","doi":"10.1002/aaai.12135","DOIUrl":null,"url":null,"abstract":"<p>Machine Learning systems rely heavily on annotated instances. Such annotations are frequently done by human experts, or by tools developed by experts, and so the central message of this book, <i>Noise: A Flaw in Human Judgment</i> (Kahneman, Sibony, and Sunstein 2021) is of considerable importance to AI/Machine Learning community. The core message is that if a number of experts are asked to annotate tasks that involve judgments, these responses will frequently differ. This observation poses a problem for how analysts choose a particular annotated dataset (from the group), or process the set of responses to give a “balanced” response, or whether to reject all the annotated datasets. A further important aspect of this book is the case studies which demonstrate that differences in judgments between fellow experts have been reported in a significant number of disciplines including, business, the law, government, and medicine. Kahneman, Sibony and Sunstein (2021), referred to as KSS subsequently, discuss how Expert Biases can be reduced, but the main focus of this book is a discussion of Noise, that is, differences that often occur between fellow experts, and how Noise can often be reduced. To address the last point KSS have formulated a set of six decision hygiene principles which include the recommendation that complex tasks should be subdivided, and then each subtask should be solved separately. A further principle is that each task should be solved by individual experts before the various judgments are discussed with fellow experts. Effectively, the book being reviewed covers three main topics: First, it reports several motivating studies that show how judgments of fellow experts varied significantly in the pricing of insurance premiums, and in setting the lengths of custodial sentences. These motivating studies very effectively illustrate the central concepts of Judgment, Noise, and Bias; that section also provides definitions of these core concepts and discusses how Noise is often amplified in group meetings. Secondly, the authors provide detailed discussion of further studies, in a variety of domains, which report the levels of disagreement between experts. Thirdly, KSS discusses how to reduce the levels of Noise between experts, as noted above, the authors refer to these as Principles of Noise Hygiene. These three parts are interwoven in a complex way throughout the book; in our view, the best overview of the book is given in the section Review and Conclusions: Taking Noise Seriously (KSS, p. 361).</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"44 4","pages":"555-567"},"PeriodicalIF":2.5000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12135","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ai Magazine","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aaai.12135","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Machine Learning systems rely heavily on annotated instances. Such annotations are frequently done by human experts, or by tools developed by experts, and so the central message of this book, Noise: A Flaw in Human Judgment (Kahneman, Sibony, and Sunstein 2021) is of considerable importance to AI/Machine Learning community. The core message is that if a number of experts are asked to annotate tasks that involve judgments, these responses will frequently differ. This observation poses a problem for how analysts choose a particular annotated dataset (from the group), or process the set of responses to give a “balanced” response, or whether to reject all the annotated datasets. A further important aspect of this book is the case studies which demonstrate that differences in judgments between fellow experts have been reported in a significant number of disciplines including, business, the law, government, and medicine. Kahneman, Sibony and Sunstein (2021), referred to as KSS subsequently, discuss how Expert Biases can be reduced, but the main focus of this book is a discussion of Noise, that is, differences that often occur between fellow experts, and how Noise can often be reduced. To address the last point KSS have formulated a set of six decision hygiene principles which include the recommendation that complex tasks should be subdivided, and then each subtask should be solved separately. A further principle is that each task should be solved by individual experts before the various judgments are discussed with fellow experts. Effectively, the book being reviewed covers three main topics: First, it reports several motivating studies that show how judgments of fellow experts varied significantly in the pricing of insurance premiums, and in setting the lengths of custodial sentences. These motivating studies very effectively illustrate the central concepts of Judgment, Noise, and Bias; that section also provides definitions of these core concepts and discusses how Noise is often amplified in group meetings. Secondly, the authors provide detailed discussion of further studies, in a variety of domains, which report the levels of disagreement between experts. Thirdly, KSS discusses how to reduce the levels of Noise between experts, as noted above, the authors refer to these as Principles of Noise Hygiene. These three parts are interwoven in a complex way throughout the book; in our view, the best overview of the book is given in the section Review and Conclusions: Taking Noise Seriously (KSS, p. 361).

Abstract Image

查看原文本刊更多论文

专家组的决定往往各不相同：这对人工智能和机器学习有何影响？关于 "噪音 "的评论：人类判断力的缺陷》的评论，作者：卡尼曼、西博尼和孙斯坦（2021 年）

机器学习系统在很大程度上依赖于注释实例。这些注释通常由人类专家或专家开发的工具完成，因此本书的核心信息《噪音：人类判断的缺陷》（Noise：人类判断力的缺陷》（Kahneman、Sibony 和 Sunstein，2021 年）一书的中心思想对人工智能/机器学习界相当重要。该书的核心信息是，如果要求一些专家对涉及判断的任务进行注释，这些专家的回答往往会有所不同。这一观察结果给分析人员带来了一个问题，即如何（从群体中）选择特定的注释数据集，或如何处理响应集以给出 "平衡 "响应，或是否拒绝所有注释数据集。本书的另一个重要方面是案例研究，这些案例研究表明，包括商业、法律、政府和医学在内的许多学科都有专家同行之间判断差异的报道。卡尼曼、西博尼和孙斯坦（Kahneman, Sibony and Sunstein，2021 年）（随后简称为 KSS）讨论了如何减少专家偏见，但本书的重点是讨论噪音，即专家同行之间经常出现的差异，以及如何减少噪音。针对最后一点，KSS 制定了一套六项决策卫生原则，其中包括建议将复杂的任务进行细分，然后分别解决每个子任务。另一项原则是，每项任务都应先由专家个人解决，然后再与其他专家讨论各种判断。实际上，这本书主要涉及三个主题：首先，该书报告了几项激励性研究，这些研究表明，在保险费定价和确定监禁刑期方面，同行专家的判断如何存在显著差异。这些激励性研究非常有效地说明了 "判断"、"噪音 "和 "偏见 "等核心概念；该部分还提供了这些核心概念的定义，并讨论了 "噪音 "在小组会议中如何经常被放大。其次，作者详细讨论了在不同领域开展的进一步研究，这些研究报告了专家之间的分歧程度。第三，KSS 讨论了如何降低专家之间的 "噪音 "水平，如上所述，作者将其称为 "噪音卫生原则"。这三个部分在全书中以复杂的方式交织在一起；我们认为，《回顾与结论》部分是对全书最好的概述：认真对待噪声》（KSS，第 361 页）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ai Magazine 工程技术-计算机：人工智能

CiteScore

3.90

自引率

11.10%

发文量

审稿时长

>12 weeks

期刊介绍： AI Magazine publishes original articles that are reasonably self-contained and aimed at a broad spectrum of the AI community. Technical content should be kept to a minimum. In general, the magazine does not publish articles that have been published elsewhere in whole or in part. The magazine welcomes the contribution of articles on the theory and practice of AI as well as general survey articles, tutorial articles on timely topics, conference or symposia or workshop reports, and timely columns on topics of interest to AI scientists.