Laura E Matzen, Zoe N Gastelum, Breannan C Howell, Kristin M Divis, Mallory C Stites
{"title":"Effects of machine learning errors on human decision-making: manipulations of model accuracy, error types, and error importance.","authors":"Laura E Matzen, Zoe N Gastelum, Breannan C Howell, Kristin M Divis, Mallory C Stites","doi":"10.1186/s41235-024-00586-2","DOIUrl":null,"url":null,"abstract":"<p><p>This study addressed the cognitive impacts of providing correct and incorrect machine learning (ML) outputs in support of an object detection task. The study consisted of five experiments that manipulated the accuracy and importance of mock ML outputs. In each of the experiments, participants were given the T and L task with T-shaped targets and L-shaped distractors. They were tasked with categorizing each image as target present or target absent. In Experiment 1, they performed this task without the aid of ML outputs. In Experiments 2-5, they were shown images with bounding boxes, representing the output of an ML model. The outputs could be correct (hits and correct rejections), or they could be erroneous (false alarms and misses). Experiment 2 manipulated the overall accuracy of these mock ML outputs. Experiment 3 manipulated the proportion of different types of errors. Experiments 4 and 5 manipulated the importance of specific types of stimuli or model errors, as well as the framing of the task in terms of human or model performance. These experiments showed that model misses were consistently harder for participants to detect than model false alarms. In general, as the model's performance increased, human performance increased as well, but in many cases the participants were more likely to overlook model errors when the model had high accuracy overall. Warning participants to be on the lookout for specific types of model errors had very little impact on their performance. Overall, our results emphasize the importance of considering human cognition when determining what level of model performance and types of model errors are acceptable for a given task.</p>","PeriodicalId":46827,"journal":{"name":"Cognitive Research-Principles and Implications","volume":"9 1","pages":"56"},"PeriodicalIF":3.4000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11345344/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Research-Principles and Implications","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1186/s41235-024-00586-2","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
This study addressed the cognitive impacts of providing correct and incorrect machine learning (ML) outputs in support of an object detection task. The study consisted of five experiments that manipulated the accuracy and importance of mock ML outputs. In each of the experiments, participants were given the T and L task with T-shaped targets and L-shaped distractors. They were tasked with categorizing each image as target present or target absent. In Experiment 1, they performed this task without the aid of ML outputs. In Experiments 2-5, they were shown images with bounding boxes, representing the output of an ML model. The outputs could be correct (hits and correct rejections), or they could be erroneous (false alarms and misses). Experiment 2 manipulated the overall accuracy of these mock ML outputs. Experiment 3 manipulated the proportion of different types of errors. Experiments 4 and 5 manipulated the importance of specific types of stimuli or model errors, as well as the framing of the task in terms of human or model performance. These experiments showed that model misses were consistently harder for participants to detect than model false alarms. In general, as the model's performance increased, human performance increased as well, but in many cases the participants were more likely to overlook model errors when the model had high accuracy overall. Warning participants to be on the lookout for specific types of model errors had very little impact on their performance. Overall, our results emphasize the importance of considering human cognition when determining what level of model performance and types of model errors are acceptable for a given task.
本研究探讨了提供正确和错误的机器学习(ML)输出以支持物体检测任务对认知的影响。研究包括五项实验,对模拟 ML 输出的准确性和重要性进行了操作。在每个实验中,参与者都会接到 T 形目标和 L 形干扰物的 T 和 L 任务。他们的任务是将每幅图像归类为目标存在或目标不存在。在实验 1 中,他们在没有 ML 输出辅助的情况下完成了这项任务。在实验 2-5 中,他们看到的图像带有边界框,代表了 ML 模型的输出结果。输出结果可能是正确的(命中和正确拒绝),也可能是错误的(误报和漏报)。实验 2 操作了这些模拟 ML 输出的总体准确性。实验 3 控制了不同类型错误的比例。实验 4 和 5 对特定类型的刺激或模型错误的重要性,以及从人类或模型性能的角度对任务的框架进行了操作。这些实验表明,对于参与者来说,模型失误始终比模型误报更难发现。一般来说,随着模型性能的提高,人类的性能也会随之提高,但在许多情况下,当模型整体准确率较高时,参与者更容易忽略模型错误。警告参与者注意特定类型的模型错误对他们的表现影响很小。总之,我们的研究结果强调了在确定特定任务可接受的模型性能水平和模型错误类型时考虑人类认知的重要性。