BAD-FM: Backdoor Attacks Against Factorization-Machine Based Neural Network for Tabular Data Prediction

IF 1.6 4区计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Chinese Journal of Electronics Pub Date : 2024-07-22 DOI:10.23919/cje.2023.00.041

Lingshuo Meng;Xueluan Gong;Yanjiao Chen

{"title":"BAD-FM: Backdoor Attacks Against Factorization-Machine Based Neural Network for Tabular Data Prediction","authors":"Lingshuo Meng;Xueluan Gong;Yanjiao Chen","doi":"10.23919/cje.2023.00.041","DOIUrl":null,"url":null,"abstract":"Backdoor attacks pose great threats to deep neural network models. All existing backdoor attacks are designed for unstructured data (image, voice, and text), but not structured tabular data, which has wide real-world applications, e.g., recommendation systems, fraud detection, and click-through rate prediction. To bridge this research gap, we make the first attempt to design a backdoor attack framework, named BAD-FM, for tabular data prediction models. Unlike images or voice samples composed of homogeneous pixels or signals with continuous values, tabular data samples contain well-defined heterogeneous fields that are usually sparse and discrete. Tabular data prediction models do not solely rely on deep networks but combine shallow components (e.g., factorization machine, FM) with deep components to capture sophisticated feature interactions among fields. To tailor the backdoor attack framework to tabular data models, we carefully design field selection and trigger formation algorithms to intensify the influence of the trigger on the backdoored model. We evaluate BAD-FM with extensive experiments on four datasets, i.e., HUAWEI, Criteo, Avazu, and KDD. The results show that BAD-FM can achieve an attack success rate as high as 100% at a poisoning ratio of 0.001%, outperforming baselines adapted from existing backdoor attacks against unstructured data models. As tabular data prediction models are widely adopted in finance and commerce, our work may raise alarms on the potential risks of these models and spur future research on defenses.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":"33 4","pages":"1077-1092"},"PeriodicalIF":1.6000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10606191","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10606191/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Backdoor attacks pose great threats to deep neural network models. All existing backdoor attacks are designed for unstructured data (image, voice, and text), but not structured tabular data, which has wide real-world applications, e.g., recommendation systems, fraud detection, and click-through rate prediction. To bridge this research gap, we make the first attempt to design a backdoor attack framework, named BAD-FM, for tabular data prediction models. Unlike images or voice samples composed of homogeneous pixels or signals with continuous values, tabular data samples contain well-defined heterogeneous fields that are usually sparse and discrete. Tabular data prediction models do not solely rely on deep networks but combine shallow components (e.g., factorization machine, FM) with deep components to capture sophisticated feature interactions among fields. To tailor the backdoor attack framework to tabular data models, we carefully design field selection and trigger formation algorithms to intensify the influence of the trigger on the backdoored model. We evaluate BAD-FM with extensive experiments on four datasets, i.e., HUAWEI, Criteo, Avazu, and KDD. The results show that BAD-FM can achieve an attack success rate as high as 100% at a poisoning ratio of 0.001%, outperforming baselines adapted from existing backdoor attacks against unstructured data models. As tabular data prediction models are widely adopted in finance and commerce, our work may raise alarms on the potential risks of these models and spur future research on defenses.

查看原文本刊更多论文

BAD-FM：针对基于因式分解神经网络的表格式数据预测的后门攻击

后门攻击对深度神经网络模型构成巨大威胁。现有的后门攻击都是针对非结构化数据（图像、语音和文本）设计的，但没有针对结构化表格数据，而表格数据在现实世界中有着广泛的应用，例如推荐系统、欺诈检测和点击率预测。为了弥补这一研究空白，我们首次尝试为表格数据预测模型设计了一个名为 BAD-FM 的后门攻击框架。与由具有连续值的同质像素或信号组成的图像或语音样本不同，表格数据样本包含定义明确的异质字段，通常是稀疏和离散的。表格数据预测模型并不完全依赖于深度网络，而是将浅层组件（如因式分解机、FM）与深度组件相结合，以捕捉字段之间复杂的特征交互。为了针对表格数据模型定制后门攻击框架，我们精心设计了字段选择和触发器形成算法，以加强触发器对后门模型的影响。我们在四个数据集（即 HUAWEI、Criteo、Avazu 和 KDD）上对 BAD-FM 进行了广泛的实验评估。结果表明，在中毒率为 0.001% 的情况下，BAD-FM 的攻击成功率高达 100%，优于现有针对非结构化数据模型的后门攻击基线。由于金融和商业领域广泛采用表格数据预测模型，我们的工作可能会对这些模型的潜在风险发出警报，并刺激未来的防御研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Chinese Journal of Electronics 工程技术-工程：电子与电气

CiteScore

3.70

自引率

16.70%

发文量

342

审稿时长

12.0 months

期刊介绍： CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.