Protecting machine learning from poisoning attacks: A risk-based approach

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2025-04-11 DOI:10.1016/j.cose.2025.104468

Nicola Bena , Marco Anisetti , Ernesto Damiani , Chan Yeob Yeun , Claudio A. Ardagna

{"title":"Protecting machine learning from poisoning attacks: A risk-based approach","authors":"Nicola Bena , Marco Anisetti , Ernesto Damiani , Chan Yeob Yeun , Claudio A. Ardagna","doi":"10.1016/j.cose.2025.104468","DOIUrl":null,"url":null,"abstract":"<div><div>The ever-increasing interest in and widespread diffusion of Machine Learning (ML)-based applications has driven a substantial amount of research into offensive and defensive ML. ML models can be attacked from different angles: poisoning attacks, the focus of this paper, inject maliciously crafted data points in the training set to modify the model behavior; adversarial attacks maliciously manipulate inference-time data points to fool the ML model and drive the prediction of the ML model according to the attacker’s objective. Ensemble-based techniques are among the most relevant defenses against poisoning attacks and replace the monolithic ML model with an ensemble of ML models trained on different (disjoint) subsets of the training set. They assign data points to the training sets of the models in the ensemble (routing) randomly or using a hash function, assuming that evenly distributing poisoned data points positively influences ML robustness. Our paper departs from this assumption and implements a risk-based ensemble technique where a risk management process is used to perform a smart routing of data points to the training sets. An extensive experimental evaluation demonstrates the effectiveness of the proposed approach in terms of its soundness, robustness, and performance.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"155 ","pages":"Article 104468"},"PeriodicalIF":4.8000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825001579","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The ever-increasing interest in and widespread diffusion of Machine Learning (ML)-based applications has driven a substantial amount of research into offensive and defensive ML. ML models can be attacked from different angles: poisoning attacks, the focus of this paper, inject maliciously crafted data points in the training set to modify the model behavior; adversarial attacks maliciously manipulate inference-time data points to fool the ML model and drive the prediction of the ML model according to the attacker’s objective. Ensemble-based techniques are among the most relevant defenses against poisoning attacks and replace the monolithic ML model with an ensemble of ML models trained on different (disjoint) subsets of the training set. They assign data points to the training sets of the models in the ensemble (routing) randomly or using a hash function, assuming that evenly distributing poisoned data points positively influences ML robustness. Our paper departs from this assumption and implements a risk-based ensemble technique where a risk management process is used to perform a smart routing of data points to the training sets. An extensive experimental evaluation demonstrates the effectiveness of the proposed approach in terms of its soundness, robustness, and performance.

查看原文本刊更多论文

保护机器学习免受中毒攻击：基于风险的方法

基于机器学习（ML）的应用程序日益增长的兴趣和广泛的传播推动了对攻击性和防御性ML的大量研究。ML模型可以从不同的角度进行攻击：中毒攻击，本文的重点，在训练集中注入恶意制作的数据点来修改模型行为；对抗性攻击恶意操纵推理时间数据点来欺骗机器学习模型，并根据攻击者的目标驱动机器学习模型的预测。基于集成的技术是对中毒攻击最相关的防御之一，它用在训练集的不同（不相交）子集上训练的ML模型的集成取代了单一的ML模型。他们随机或使用哈希函数将数据点分配给集成（路由）中的模型的训练集，假设均匀分布的有毒数据点对ML鲁棒性有积极影响。我们的论文偏离了这一假设，并实现了一种基于风险的集成技术，其中使用风险管理过程来执行数据点到训练集的智能路由。广泛的实验评估证明了所提出的方法在其稳健性，鲁棒性和性能方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.