A decentralized data evaluation framework in federated learning

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Blockchain-Research and Applications Pub Date : 2023-12-01 DOI:10.1016/j.bcra.2023.100152

Laveen Bhatia, Saeed Samet

{"title":"A decentralized data evaluation framework in federated learning","authors":"Laveen Bhatia, Saeed Samet","doi":"10.1016/j.bcra.2023.100152","DOIUrl":null,"url":null,"abstract":"<div><p>Federated Learning (FL) is a type of distributed deep learning framework in which multiple devices train a local model using local data, and the gradients of the local model are then sent to a central server that aggregates them to create a global model. This type of framework is ideal where data privacy is of utmost importance because the data never leave the local device. However, a major concern in FL is ensuring the data quality of local training data. Since there is no control over the local training data, ensuring that the local model is trained on clean data becomes challenging. A model trained on poor-quality data can have a significant impact on its accuracy. In this paper, we propose a decentralized approach using blockchain to ensure local model data quality. We use miners to validate each local model by checking its accuracy against a secret testing dataset. This is done using a smart contract that the miners invoke during the mining process. The local model is aggregated with the global model only if it passes a preset accuracy threshold. We test our proposed method on two datasets: the Brain Tumor Classification dataset from Kaggle, comprised of 7000 MRI images divided into two classes (Tumor/No Tumor), and the Medical MNIST dataset, which includes 58,954 images classified into six different classes: AbdomenCT, BreastMRI, ChestCT, Chest X-ray, Hand X-ray, and HeadCT. Our results show that our method outperforms the original FL approach in all experiments.</p></div>","PeriodicalId":53141,"journal":{"name":"Blockchain-Research and Applications","volume":"4 4","pages":"Article 100152"},"PeriodicalIF":6.9000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096720923000271/pdfft?md5=07c936a02b62c7c930cf9c4b0cd364c5&pid=1-s2.0-S2096720923000271-main.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Blockchain-Research and Applications","FirstCategoryId":"1093","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2096720923000271","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

Federated Learning (FL) is a type of distributed deep learning framework in which multiple devices train a local model using local data, and the gradients of the local model are then sent to a central server that aggregates them to create a global model. This type of framework is ideal where data privacy is of utmost importance because the data never leave the local device. However, a major concern in FL is ensuring the data quality of local training data. Since there is no control over the local training data, ensuring that the local model is trained on clean data becomes challenging. A model trained on poor-quality data can have a significant impact on its accuracy. In this paper, we propose a decentralized approach using blockchain to ensure local model data quality. We use miners to validate each local model by checking its accuracy against a secret testing dataset. This is done using a smart contract that the miners invoke during the mining process. The local model is aggregated with the global model only if it passes a preset accuracy threshold. We test our proposed method on two datasets: the Brain Tumor Classification dataset from Kaggle, comprised of 7000 MRI images divided into two classes (Tumor/No Tumor), and the Medical MNIST dataset, which includes 58,954 images classified into six different classes: AbdomenCT, BreastMRI, ChestCT, Chest X-ray, Hand X-ray, and HeadCT. Our results show that our method outperforms the original FL approach in all experiments.

查看原文本刊更多论文

联邦学习中的分散式数据评估框架

联合学习（FL）是一种分布式深度学习框架，其中多个设备使用本地数据训练一个本地模型，然后将本地模型的梯度发送到中央服务器，由服务器汇总后创建一个全局模型。在数据隐私至关重要的情况下，这种框架是理想的选择，因为数据永远不会离开本地设备。不过，FL 的一个主要问题是确保本地训练数据的质量。由于无法控制本地训练数据，因此确保本地模型是在干净的数据上训练出来的就变得非常具有挑战性。在劣质数据上训练出来的模型会对其准确性产生重大影响。在本文中，我们提出了一种利用区块链确保本地模型数据质量的去中心化方法。我们利用矿工根据秘密测试数据集检查每个本地模型的准确性，从而对其进行验证。这是通过矿工在挖矿过程中调用的智能合约完成的。局部模型只有通过预设的准确度阈值，才能与全局模型聚合。我们在两个数据集上测试了我们提出的方法：来自 Kaggle 的脑肿瘤分类数据集和医学 MNIST 数据集，前者由 7000 张 MRI 图像组成，分为两个类别（肿瘤/无肿瘤），后者包括 58954 张图像，分为六个不同的类别：腹部 CT、乳腺 MRI、胸部 CT、胸部 X 光、手部 X 光和头部 CT。结果表明，我们的方法在所有实验中都优于原始的 FL 方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Blockchain-Research and Applications

CiteScore

11.30

自引率

3.60%

发文量

期刊介绍： Blockchain: Research and Applications is an international, peer reviewed journal for researchers, engineers, and practitioners to present the latest advances and innovations in blockchain research. The journal publishes theoretical and applied papers in established and emerging areas of blockchain research to shape the future of blockchain technology.