Assessing the Impact of Automated Document Classification Decisions on Human Decision-Making

Mallory C. Stites, Breannan C. Howell, Phillip Baxley
{"title":"Assessing the Impact of Automated Document Classification Decisions on Human Decision-Making","authors":"Mallory C. Stites, Breannan C. Howell, Phillip Baxley","doi":"10.54941/ahfe1003946","DOIUrl":null,"url":null,"abstract":"As machine learning (ML) algorithms are incorporated into more high-consequence domains, it is important to understand their impact on human decision-making. This need becomes particularly apparent when the goal is to augment performance rather than replace a human analyst. The derivative classification (DC) document review process is an area that is ripe for the application of such ML algorithms. In this process, derivative classifiers (DCs), who are technical experts in specialized topic areas, make decisions about a document’s classification level and category by comparing the document with a classification guide. As the volume of documents to be reviewed continues to increase, and text analytics and other types of models become more accessible, it may be possible to incorporate automated classification suggestions to increase DC efficiency and accuracy. However, care must be taken to ensure that tool-generated suggestions do not introduce errors into the process, which could lead to disastrous impacts for national security. In the current study, we assess the impact of model-generated classification decisions on DC accuracy, response time, and confidence while reviewing document snippets in a controlled environment and compare them to DC performance in the absence of the tool (baseline). Across two assessments, we found that correct tool suggestions improved human accuracy relative to baseline, and decreased response times relative to baseline in one of these assessments. Incorrect tool suggestions produced a higher human error rate but did not impact response times. Interestingly, incorrect tool suggestions also resulted in higher confidence ratings when DCs made errors that aligned with the incorrect suggestion relative to cases in which they correctly disregarded its suggestion. These results highlight that while ML tools can enhance performance when the output is accurate, they also have the potential for impairing analyst decision-making performance if inaccurate. This has the potential for negative impacts on national security. Findings have implications for the incorporation of ML or other automated suggestions not only in the derivative classification domain, but also in other high-consequence domains that incorporate automated tools into a human decision-making process. The effects of factors such as tool accuracy, transparency, and DC expertise should all be taken into account when designing such systems to ensure the automated suggestions improve performance without introducing additional errors. SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525","PeriodicalId":102446,"journal":{"name":"Human Factors and Simulation","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Factors and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1003946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As machine learning (ML) algorithms are incorporated into more high-consequence domains, it is important to understand their impact on human decision-making. This need becomes particularly apparent when the goal is to augment performance rather than replace a human analyst. The derivative classification (DC) document review process is an area that is ripe for the application of such ML algorithms. In this process, derivative classifiers (DCs), who are technical experts in specialized topic areas, make decisions about a document’s classification level and category by comparing the document with a classification guide. As the volume of documents to be reviewed continues to increase, and text analytics and other types of models become more accessible, it may be possible to incorporate automated classification suggestions to increase DC efficiency and accuracy. However, care must be taken to ensure that tool-generated suggestions do not introduce errors into the process, which could lead to disastrous impacts for national security. In the current study, we assess the impact of model-generated classification decisions on DC accuracy, response time, and confidence while reviewing document snippets in a controlled environment and compare them to DC performance in the absence of the tool (baseline). Across two assessments, we found that correct tool suggestions improved human accuracy relative to baseline, and decreased response times relative to baseline in one of these assessments. Incorrect tool suggestions produced a higher human error rate but did not impact response times. Interestingly, incorrect tool suggestions also resulted in higher confidence ratings when DCs made errors that aligned with the incorrect suggestion relative to cases in which they correctly disregarded its suggestion. These results highlight that while ML tools can enhance performance when the output is accurate, they also have the potential for impairing analyst decision-making performance if inaccurate. This has the potential for negative impacts on national security. Findings have implications for the incorporation of ML or other automated suggestions not only in the derivative classification domain, but also in other high-consequence domains that incorporate automated tools into a human decision-making process. The effects of factors such as tool accuracy, transparency, and DC expertise should all be taken into account when designing such systems to ensure the automated suggestions improve performance without introducing additional errors. SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525
评估自动文档分类决策对人类决策的影响
随着机器学习(ML)算法被纳入更多高后果领域,了解它们对人类决策的影响非常重要。当目标是提高性能而不是取代人工分析师时,这种需求变得尤为明显。衍生分类(DC)文档审查过程是应用此类ML算法的成熟领域。在这个过程中,衍生分类器(dc)是专门主题领域的技术专家,他们通过将文档与分类指南进行比较来决定文档的分类级别和类别。随着要审查的文档数量不断增加,文本分析和其他类型的模型变得更容易访问,可能会合并自动分类建议,以提高数据中心的效率和准确性。但是,必须注意确保工具生成的建议不会在过程中引入错误,从而可能对国家安全造成灾难性影响。在当前的研究中,我们评估了模型生成的分类决策对数据中心准确性、响应时间和信心的影响,同时在受控环境中审查文档片段,并将它们与没有工具(基线)的数据中心性能进行比较。在两次评估中,我们发现在其中一次评估中,正确的工具建议提高了相对于基线的人类准确性,并减少了相对于基线的响应时间。不正确的工具建议会产生更高的人为错误率,但不会影响响应时间。有趣的是,不正确的工具建议也会导致更高的信心评级,当dc犯的错误与不正确的建议相一致时,相对于他们正确地忽略了它的建议。这些结果强调,虽然机器学习工具可以在输出准确时提高性能,但如果输出不准确,它们也有可能损害分析师的决策性能。这有可能对国家安全产生负面影响。研究结果不仅在衍生分类领域,而且在将自动化工具纳入人类决策过程的其他高后果领域中,对ML或其他自动化建议的整合具有影响。在设计此类系统时,应考虑工具精度、透明度和DC专业知识等因素的影响,以确保自动建议在不引入额外错误的情况下提高性能。SNL由NTESS根据DOE NNSA合同DE-NA0003525进行管理和运营
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信