Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers

IF 4.9 Q1 PSYCHOLOGY, EXPERIMENTAL
Alexander Diel , Tania Lalgi , Isabel Carolin Schröter , Karl F. MacDorman , Martin Teufel , Alexander Bäuerle
{"title":"Human performance in detecting deepfakes: A systematic review and meta-analysis of 56 papers","authors":"Alexander Diel ,&nbsp;Tania Lalgi ,&nbsp;Isabel Carolin Schröter ,&nbsp;Karl F. MacDorman ,&nbsp;Martin Teufel ,&nbsp;Alexander Bäuerle","doi":"10.1016/j.chbr.2024.100538","DOIUrl":null,"url":null,"abstract":"<div><div><em>Deepfakes</em> are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (<em>d'</em>) effect sizes (<em>k</em> = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (<em>sensitivity</em>) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], <em>k</em> = 67). For audio, accuracy was 62.08% [38.23, 83.18], <em>k</em> = 8; for images, 53.16% [42.12, 64.64], <em>k</em> = 18; for text, 52.00% [37.42, 65.88], <em>k</em> = 15; and for video, 57.31% [47.80, 66.57], <em>k</em> = 26. Odds ratios were 0.64 [0.52, 0.79], <em>k</em> = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, <em>d'</em> values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], <em>k</em> = 15), especially for video stimuli.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"16 ","pages":"Article 100538"},"PeriodicalIF":4.9000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in human behavior reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2451958824001714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Deepfakes are AI-generated media designed to look real, often with the intent to deceive. Deepfakes threaten public and personal safety by facilitating disinformation, propaganda, and identity theft. Though research has been conducted on human performance in deepfake detection, the results have not yet been synthesized. This systematic review and meta-analysis investigates human deepfake detection accuracy. Searches in PubMed, ScienceGov, JSTOR, Google Scholar, and paper references, conducted in June and October 2024, identified empirical studies measuring human detection of high-quality deepfakes. After pooling accuracy, odds-ratio, and sensitivity (d') effect sizes (k = 137 effects) from 56 papers involving 86,155 participants, we analyzed 1) overall deepfake detection performance, 2) performance across stimulus types (audio, image, text, and video), and 3) the effects of detection-improvement strategies. Overall deepfake detection rates (sensitivity) were not significantly above chance because 95% confidence intervals crossed 50%. Total deepfake detection accuracy was 55.54% (95% CI [48.87, 62.10], k = 67). For audio, accuracy was 62.08% [38.23, 83.18], k = 8; for images, 53.16% [42.12, 64.64], k = 18; for text, 52.00% [37.42, 65.88], k = 15; and for video, 57.31% [47.80, 66.57], k = 26. Odds ratios were 0.64 [0.52, 0.79], k = 62, indicating 39% detection accuracy, below chance (audio 45%, image 35%, text 40%, video 40%). Moreover, d' values show no significant difference from chance. However, strategies like feedback training, AI support, and deepfake caricaturization improved detection performance above chance levels (65.14% [55.21, 74.46], k = 15), especially for video stimuli.
人类在检测深度造假中的表现:对56篇论文的系统回顾和荟萃分析
深度造假是人工智能生成的媒体,旨在看起来真实,通常带有欺骗的意图。深度造假通过助长虚假信息、宣传和身份盗窃,威胁公共和个人安全。虽然已经对人类在深度伪造检测中的表现进行了研究,但结果尚未合成。本系统综述和荟萃分析调查了人类深度伪造检测的准确性。在PubMed, ScienceGov, JSTOR, b谷歌Scholar和论文参考文献中进行的搜索于2024年6月和10月进行,确定了测量人类检测高质量深度伪造的实证研究。在汇总了涉及86155名参与者的56篇论文的准确性、优势比和灵敏度(d')效应大小(k = 137个效应)后,我们分析了1)整体深度伪造检测性能,2)不同刺激类型(音频、图像、文本和视频)的性能,以及3)检测改进策略的效果。总体深度伪造的检测率(灵敏度)没有显著高于概率,因为95%的置信区间超过50%。总深度造假检测准确率为55.54% (95% CI [48.87, 62.10], k = 67)。音频准确率为62.08% [38.23,83.18],k = 8;对于图像,53.16% [42.12,64.64],k = 18;对于文本,52.00% [37.42,65.88],k = 15;视频为57.31% [47.80,66.57],k = 26。比值比为0.64 [0.52,0.79],k = 62,表明检测准确率为39%,低于概率(音频45%,图像35%,文本40%,视频40%)。d'值与偶然性无显著差异。然而,反馈训练、人工智能支持和深度假漫画化等策略将检测性能提高到机会水平以上(65.14% [55.21,74.46],k = 15),特别是对于视频刺激。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信