BAD-FM: Backdoor Attacks Against Factorization-Machine Based Neural Network for Tabular Data Prediction

IF 1.6 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Lingshuo Meng;Xueluan Gong;Yanjiao Chen
{"title":"BAD-FM: Backdoor Attacks Against Factorization-Machine Based Neural Network for Tabular Data Prediction","authors":"Lingshuo Meng;Xueluan Gong;Yanjiao Chen","doi":"10.23919/cje.2023.00.041","DOIUrl":null,"url":null,"abstract":"Backdoor attacks pose great threats to deep neural network models. All existing backdoor attacks are designed for unstructured data (image, voice, and text), but not structured tabular data, which has wide real-world applications, e.g., recommendation systems, fraud detection, and click-through rate prediction. To bridge this research gap, we make the first attempt to design a backdoor attack framework, named BAD-FM, for tabular data prediction models. Unlike images or voice samples composed of homogeneous pixels or signals with continuous values, tabular data samples contain well-defined heterogeneous fields that are usually sparse and discrete. Tabular data prediction models do not solely rely on deep networks but combine shallow components (e.g., factorization machine, FM) with deep components to capture sophisticated feature interactions among fields. To tailor the backdoor attack framework to tabular data models, we carefully design field selection and trigger formation algorithms to intensify the influence of the trigger on the backdoored model. We evaluate BAD-FM with extensive experiments on four datasets, i.e., HUAWEI, Criteo, Avazu, and KDD. The results show that BAD-FM can achieve an attack success rate as high as 100% at a poisoning ratio of 0.001%, outperforming baselines adapted from existing backdoor attacks against unstructured data models. As tabular data prediction models are widely adopted in finance and commerce, our work may raise alarms on the potential risks of these models and spur future research on defenses.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10606191","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10606191/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Backdoor attacks pose great threats to deep neural network models. All existing backdoor attacks are designed for unstructured data (image, voice, and text), but not structured tabular data, which has wide real-world applications, e.g., recommendation systems, fraud detection, and click-through rate prediction. To bridge this research gap, we make the first attempt to design a backdoor attack framework, named BAD-FM, for tabular data prediction models. Unlike images or voice samples composed of homogeneous pixels or signals with continuous values, tabular data samples contain well-defined heterogeneous fields that are usually sparse and discrete. Tabular data prediction models do not solely rely on deep networks but combine shallow components (e.g., factorization machine, FM) with deep components to capture sophisticated feature interactions among fields. To tailor the backdoor attack framework to tabular data models, we carefully design field selection and trigger formation algorithms to intensify the influence of the trigger on the backdoored model. We evaluate BAD-FM with extensive experiments on four datasets, i.e., HUAWEI, Criteo, Avazu, and KDD. The results show that BAD-FM can achieve an attack success rate as high as 100% at a poisoning ratio of 0.001%, outperforming baselines adapted from existing backdoor attacks against unstructured data models. As tabular data prediction models are widely adopted in finance and commerce, our work may raise alarms on the potential risks of these models and spur future research on defenses.
BAD-FM:针对基于因式分解神经网络的表格式数据预测的后门攻击
后门攻击对深度神经网络模型构成巨大威胁。现有的后门攻击都是针对非结构化数据(图像、语音和文本)设计的,但没有针对结构化表格数据,而表格数据在现实世界中有着广泛的应用,例如推荐系统、欺诈检测和点击率预测。为了弥补这一研究空白,我们首次尝试为表格数据预测模型设计了一个名为 BAD-FM 的后门攻击框架。与由具有连续值的同质像素或信号组成的图像或语音样本不同,表格数据样本包含定义明确的异质字段,通常是稀疏和离散的。表格数据预测模型并不完全依赖于深度网络,而是将浅层组件(如因式分解机、FM)与深度组件相结合,以捕捉字段之间复杂的特征交互。为了针对表格数据模型定制后门攻击框架,我们精心设计了字段选择和触发器形成算法,以加强触发器对后门模型的影响。我们在四个数据集(即 HUAWEI、Criteo、Avazu 和 KDD)上对 BAD-FM 进行了广泛的实验评估。结果表明,在中毒率为 0.001% 的情况下,BAD-FM 的攻击成功率高达 100%,优于现有针对非结构化数据模型的后门攻击基线。由于金融和商业领域广泛采用表格数据预测模型,我们的工作可能会对这些模型的潜在风险发出警报,并刺激未来的防御研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Chinese Journal of Electronics
Chinese Journal of Electronics 工程技术-工程:电子与电气
CiteScore
3.70
自引率
16.70%
发文量
342
审稿时长
12.0 months
期刊介绍: CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信