论随机森林对抗无目标数据中毒的鲁棒性：基于集合的方法

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Sustainable Computing Pub Date : 2023-07-07 DOI:10.1109/TSUSC.2023.3293269

Marco Anisetti;Claudio A. Ardagna;Alessandro Balestrucci;Nicola Bena;Ernesto Damiani;Chan Yeob Yeun

{"title":"论随机森林对抗无目标数据中毒的鲁棒性：基于集合的方法","authors":"Marco Anisetti;Claudio A. Ardagna;Alessandro Balestrucci;Nicola Bena;Ernesto Damiani;Chan Yeob Yeun","doi":"10.1109/TSUSC.2023.3293269","DOIUrl":null,"url":null,"abstract":"Machine learning is becoming ubiquitous. From finance to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses received increasing attention in the last decade, leading to several promising solutions aiming to increase the robustness of machine learning. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, provide strong theoretical guarantees at the price of a linear overhead. Surprisingly, ensemble-based defenses, which do not pose any restrictions on the base model, have not been applied to increase the robustness of random forest. The work in this paper aims to fill in this gap by designing and implementing a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks. An extensive experimental evaluation measures the performance of our approach against a variety of attacks, as well as its sustainability in terms of resource consumption and performance, and compares it with a traditional monolithic model based on random forest. A final discussion presents our main findings and compares our approach with existing poisoning defenses targeting random forests.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"8 4","pages":"540-554"},"PeriodicalIF":3.0000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach\",\"authors\":\"Marco Anisetti;Claudio A. Ardagna;Alessandro Balestrucci;Nicola Bena;Ernesto Damiani;Chan Yeob Yeun\",\"doi\":\"10.1109/TSUSC.2023.3293269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning is becoming ubiquitous. From finance to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses received increasing attention in the last decade, leading to several promising solutions aiming to increase the robustness of machine learning. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, provide strong theoretical guarantees at the price of a linear overhead. Surprisingly, ensemble-based defenses, which do not pose any restrictions on the base model, have not been applied to increase the robustness of random forest. The work in this paper aims to fill in this gap by designing and implementing a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks. An extensive experimental evaluation measures the performance of our approach against a variety of attacks, as well as its sustainability in terms of resource consumption and performance, and compares it with a traditional monolithic model based on random forest. A final discussion presents our main findings and compares our approach with existing poisoning defenses targeting random forests.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"8 4\",\"pages\":\"540-554\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10175648/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10175648/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

机器学习正变得无处不在。从金融到医学，机器学习模型正在推动决策过程，甚至在某些任务中超越人类。然而，在预测质量方面取得的这一巨大进步并没有在此类模型和相应预测的安全性方面找到对应的解决方案，对训练集的部分内容进行扰动（中毒）会严重破坏模型的准确性。在过去十年中，有关中毒攻击和防御的研究受到越来越多的关注，并产生了几种有望提高机器学习鲁棒性的解决方案。其中，基于集合的防御，即在部分训练集上训练不同的模型，然后汇总它们的预测结果，以线性开销为代价，提供了强有力的理论保证。令人惊讶的是，基于集合的防御方法对基础模型不做任何限制，但却没有应用于提高随机森林的鲁棒性。本文的研究旨在通过设计和实施一种新颖的基于哈希值的集合方法来填补这一空白，从而保护随机森林免受无针对性的随机中毒攻击。广泛的实验评估衡量了我们的方法抵御各种攻击的性能，以及在资源消耗和性能方面的可持续性，并将其与基于随机森林的传统单一模型进行了比较。最后的讨论介绍了我们的主要发现，并将我们的方法与现有的针对随机森林的中毒防御进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach

Machine learning is becoming ubiquitous. From finance to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses received increasing attention in the last decade, leading to several promising solutions aiming to increase the robustness of machine learning. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, provide strong theoretical guarantees at the price of a linear overhead. Surprisingly, ensemble-based defenses, which do not pose any restrictions on the base model, have not been applied to increase the robustness of random forest. The work in this paper aims to fill in this gap by designing and implementing a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks. An extensive experimental evaluation measures the performance of our approach against a variety of attacks, as well as its sustainability in terms of resource consumption and performance, and compares it with a traditional monolithic model based on random forest. A final discussion presents our main findings and compares our approach with existing poisoning defenses targeting random forests.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization

CiteScore

7.70

自引率

2.60%

发文量