Lucas Airam C. de Souza, G. Rebello, G. Camilo, Lucas C. B. Guimarães, O. Duarte
{"title":"DFedForest: Decentralized Federated Forest","authors":"Lucas Airam C. de Souza, G. Rebello, G. Camilo, Lucas C. B. Guimarães, O. Duarte","doi":"10.1109/Blockchain50366.2020.00019","DOIUrl":null,"url":null,"abstract":"The effectiveness of machine learning systems depends heavily on the relevance of the training data. Usually, the collected data is sensitive and private because it comes from devices and sensors used in people’s daily lives. The General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in California, and China’s Cybersecurity Law put the current approach at risk, as it prohibits centralized remote processing of sensitive data collected in a distributed manner. This paper proposes a distributed machine learning system based on local random forest algorithms created with shared decision trees through the blockchain. The results show that the proposed approach equals or exceeds the results obtained with the use of random forests with only local data. Furthermore, the proposal increases the detection of new attacks when the domains have different threat distributions.","PeriodicalId":109440,"journal":{"name":"2020 IEEE International Conference on Blockchain (Blockchain)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Blockchain (Blockchain)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Blockchain50366.2020.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27
Abstract
The effectiveness of machine learning systems depends heavily on the relevance of the training data. Usually, the collected data is sensitive and private because it comes from devices and sensors used in people’s daily lives. The General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in California, and China’s Cybersecurity Law put the current approach at risk, as it prohibits centralized remote processing of sensitive data collected in a distributed manner. This paper proposes a distributed machine learning system based on local random forest algorithms created with shared decision trees through the blockchain. The results show that the proposed approach equals or exceeds the results obtained with the use of random forests with only local data. Furthermore, the proposal increases the detection of new attacks when the domains have different threat distributions.