Active learning with human heuristics: an algorithm robust to labeling bias.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence Pub Date : 2024-11-19 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1491932

Sriram Ravichandran, Nandan Sudarsanam, Balaraman Ravindran, Konstantinos V Katsikopoulos

{"title":"Active learning with human heuristics: an algorithm robust to labeling bias.","authors":"Sriram Ravichandran, Nandan Sudarsanam, Balaraman Ravindran, Konstantinos V Katsikopoulos","doi":"10.3389/frai.2024.1491932","DOIUrl":null,"url":null,"abstract":"<p><p>Active learning enables prediction models to achieve better performance faster by adaptively querying an oracle for the labels of data points. Sometimes the oracle is a human, for example when a medical diagnosis is provided by a doctor. According to the behavioral sciences, people, because they employ heuristics, might sometimes exhibit biases in labeling. How does modeling the oracle as a human heuristic affect the performance of active learning algorithms? If there is a drop in performance, can one design active learning algorithms robust to labeling bias? The present article provides answers. We investigate two established human heuristics (fast-and-frugal tree, tallying model) combined with four active learning algorithms (entropy sampling, multi-view learning, conventional information density, and, our proposal, inverse information density) and three standard classifiers (logistic regression, random forests, support vector machines), and apply their combinations to 15 datasets where people routinely provide labels, such as health and other domains like marketing and transportation. There are two main results. First, we show that if a heuristic provides labels, the performance of active learning algorithms significantly drops, sometimes below random. Hence, it is key to design active learning algorithms that are robust to labeling bias. Our second contribution is to provide such a robust algorithm. The proposed inverse information density algorithm, which is inspired by human psychology, achieves an overall improvement of 87% over the best of the other algorithms. In conclusion, designing and benchmarking active learning algorithms can benefit from incorporating the modeling of human heuristics.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1491932"},"PeriodicalIF":3.0000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11611880/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1491932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Active learning enables prediction models to achieve better performance faster by adaptively querying an oracle for the labels of data points. Sometimes the oracle is a human, for example when a medical diagnosis is provided by a doctor. According to the behavioral sciences, people, because they employ heuristics, might sometimes exhibit biases in labeling. How does modeling the oracle as a human heuristic affect the performance of active learning algorithms? If there is a drop in performance, can one design active learning algorithms robust to labeling bias? The present article provides answers. We investigate two established human heuristics (fast-and-frugal tree, tallying model) combined with four active learning algorithms (entropy sampling, multi-view learning, conventional information density, and, our proposal, inverse information density) and three standard classifiers (logistic regression, random forests, support vector machines), and apply their combinations to 15 datasets where people routinely provide labels, such as health and other domains like marketing and transportation. There are two main results. First, we show that if a heuristic provides labels, the performance of active learning algorithms significantly drops, sometimes below random. Hence, it is key to design active learning algorithms that are robust to labeling bias. Our second contribution is to provide such a robust algorithm. The proposed inverse information density algorithm, which is inspired by human psychology, achieves an overall improvement of 87% over the best of the other algorithms. In conclusion, designing and benchmarking active learning algorithms can benefit from incorporating the modeling of human heuristics.

查看原文本刊更多论文

人类启发式的主动学习：一种对标记偏差具有鲁棒性的算法。

主动学习通过自适应地查询oracle以获取数据点的标签，使预测模型能够更快地获得更好的性能。有时神谕是一个人，例如当医生提供医疗诊断时。根据行为科学，由于人们使用启发式，有时可能会在标签上表现出偏见。将oracle建模为人类启发式如何影响主动学习算法的性能？如果性能下降，是否可以设计对标签偏差具有鲁棒性的主动学习算法？本文提供了答案。我们研究了结合四种主动学习算法（熵采样、多视图学习、传统信息密度和我们建议的逆信息密度）和三种标准分类器（逻辑回归、随机森林、支持向量机）的两种已建立的人类启发式算法（快速节俭树、计数模型），并将它们的组合应用于15个数据集，其中人们通常提供标签。比如健康和其他领域，比如营销和运输。有两个主要结果。首先，我们表明，如果启发式提供标签，主动学习算法的性能显着下降，有时低于随机。因此，设计对标签偏差具有鲁棒性的主动学习算法是关键。我们的第二个贡献是提供这样一个健壮的算法。本文提出的逆信息密度算法受到人类心理学的启发，总体上比其他最佳算法提高了87%。总之，主动学习算法的设计和基准测试可以从结合人类启发式建模中受益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊