R. Guerraoui, Arsany Guirguis, Jérémy Plassmann, Anton Ragot, Sébastien Rouault
{"title":"加菲猫:拜占庭机器学习的系统支持(普通论文)","authors":"R. Guerraoui, Arsany Guirguis, Jérémy Plassmann, Anton Ragot, Sébastien Rouault","doi":"10.1109/DSN48987.2021.00021","DOIUrl":null,"url":null,"abstract":"We present GARFIELD, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine–resilient. GARFIELD relies on a novel object–oriented design, reducing the coding effort, and addressing the vulnerability of the shared–graph architecture followed by classical ML frameworks. GARFIELD encompasses various communication patterns and supports computations on CPUs and GPUs, allowing addressing the general question of the practical cost of Byzantine resilience in ML applications. We report on the usage of GARFIELD on three main ML architectures: (a) a single server with multiple workers, (b) several servers and workers, and (c) peer–to–peer settings. Using GARFIELD, we highlight interesting facts about the cost of Byzantine resilience. In particular, (a) Byzantine resilience, unlike crash resilience, induces an accuracy loss, (b) the throughput overhead comes more from communication than from robust aggregation, and (c) tolerating Byzantine servers costs more than tolerating Byzantine workers.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"GARFIELD: System Support for Byzantine Machine Learning (Regular Paper)\",\"authors\":\"R. Guerraoui, Arsany Guirguis, Jérémy Plassmann, Anton Ragot, Sébastien Rouault\",\"doi\":\"10.1109/DSN48987.2021.00021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present GARFIELD, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine–resilient. GARFIELD relies on a novel object–oriented design, reducing the coding effort, and addressing the vulnerability of the shared–graph architecture followed by classical ML frameworks. GARFIELD encompasses various communication patterns and supports computations on CPUs and GPUs, allowing addressing the general question of the practical cost of Byzantine resilience in ML applications. We report on the usage of GARFIELD on three main ML architectures: (a) a single server with multiple workers, (b) several servers and workers, and (c) peer–to–peer settings. Using GARFIELD, we highlight interesting facts about the cost of Byzantine resilience. In particular, (a) Byzantine resilience, unlike crash resilience, induces an accuracy loss, (b) the throughput overhead comes more from communication than from robust aggregation, and (c) tolerating Byzantine servers costs more than tolerating Byzantine workers.\",\"PeriodicalId\":222512,\"journal\":{\"name\":\"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN48987.2021.00021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN48987.2021.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GARFIELD: System Support for Byzantine Machine Learning (Regular Paper)
We present GARFIELD, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine–resilient. GARFIELD relies on a novel object–oriented design, reducing the coding effort, and addressing the vulnerability of the shared–graph architecture followed by classical ML frameworks. GARFIELD encompasses various communication patterns and supports computations on CPUs and GPUs, allowing addressing the general question of the practical cost of Byzantine resilience in ML applications. We report on the usage of GARFIELD on three main ML architectures: (a) a single server with multiple workers, (b) several servers and workers, and (c) peer–to–peer settings. Using GARFIELD, we highlight interesting facts about the cost of Byzantine resilience. In particular, (a) Byzantine resilience, unlike crash resilience, induces an accuracy loss, (b) the throughput overhead comes more from communication than from robust aggregation, and (c) tolerating Byzantine servers costs more than tolerating Byzantine workers.