Proceedings of the 3rd Workshop on Machine Learning and Systems最新文献_第2页

Best of both, Structured and Unstructured Sparsity in Neural Networks 神经网络中的结构化和非结构化稀疏性

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592583

Christoph Schulte, Sven Wagner, Armin Runge, Dimitrios Bariamis, B. Hammer

引用次数: 0

Accelerating Model Training: Performance Antipatterns Eliminator Framework 加速模型训练:性能反模式消除器框架

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592596

R. Singh, Mayank Mishra, Rekha Singhal

引用次数: 0

A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation 浅谈蒸馏超参数对联邦知识蒸馏的影响

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592590

Norah Alballa, M. Canini

{"title":"A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation","authors":"Norah Alballa, M. Canini","doi":"10.1145/3578356.3592590","DOIUrl":"https://doi.org/10.1145/3578356.3592590","url":null,"abstract":"Knowledge distillation has been known as a useful way for model compression. It has been recently adopted in the distributed training domain, such as federated learning, as a way to transfer knowledge between already pre-trained models. Knowledge distillation in distributed settings promises advantages, including significantly reducing the communication overhead and allowing heterogeneous model architectures. However, distillation is still not well studied and understood in such settings, which hinders the possible gains. We bridge this gap by performing an experimental analysis of the distillation process in the distributed training setting, mainly with non-IID data. We highlight some elements that require special considerations when transferring knowledge between already pre-trained models: the transfer set, the temperature, the weight, and the positioning. Appropriately tuning these hyper-parameters can remarkably boost learning outcomes. In our experiments, around two-thirds of the participants require settings other than commonly used default settings in literature, and appropriate tuning can reach more than five times improvement on average.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127848419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Profiling and Monitoring Deep Learning Training Tasks 分析和监控深度学习训练任务

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592589

Ehsan Yousefzadeh-Asl-Miandoab, Ties Robroek, Pinar Tozun

{"title":"Profiling and Monitoring Deep Learning Training Tasks","authors":"Ehsan Yousefzadeh-Asl-Miandoab, Ties Robroek, Pinar Tozun","doi":"10.1145/3578356.3592589","DOIUrl":"https://doi.org/10.1145/3578356.3592589","url":null,"abstract":"The embarrassingly parallel nature of deep learning training tasks makes CPU-GPU co-processors the primary commodity hardware for them. The computing and memory requirements of these tasks, however, do not always align well with the available GPU resources. It is, therefore, important to monitor and profile the behavior of training tasks on co-processors to understand better the requirements of different use cases. In this paper, our goal is to shed more light on the variety of tools for profiling and monitoring deep learning training tasks on server-grade NVIDIA GPUs. In addition to surveying the main characteristics of the tools, we analyze the functional limitations and overheads of each tool by using a both light and heavy training scenario. Our results show that monitoring tools like nvidia-smi and dcgm can be integrated with resource managers for online decision making thanks to their low overheads. On the other hand, one has to be careful about the set of metrics to correctly reason about the GPU utilization. When it comes to profiling, each tool has its time to shine; a framework-based or system-wide GPU profiler can first detect the frequent kernels or bottlenecks, and then, a lower-level GPU profiler can focus on particular kernels at the micro-architectural-level.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126000073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

FoldFormer: sequence folding and seasonal attention for fine-grained long-term FaaS forecasting FoldFormer:用于细粒度长期FaaS预测的序列折叠和季节性关注

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592582

L. N. Darlow, Artjom Joosen, Martin Asenov, Qiwen Deng, Jianfeng Wang, Adam Barker

{"title":"FoldFormer: sequence folding and seasonal attention for fine-grained long-term FaaS forecasting","authors":"L. N. Darlow, Artjom Joosen, Martin Asenov, Qiwen Deng, Jianfeng Wang, Adam Barker","doi":"10.1145/3578356.3592582","DOIUrl":"https://doi.org/10.1145/3578356.3592582","url":null,"abstract":"Fine-grained long-term (FGLT) time series forecasting is a fundamental challenge in Function as a Service (FaaS) platforms. The data that FaaS function requests produce are fine-grained (per-second/minute), often have daily periodicity, and are persistent over the long term. Forecasting in the FGLT data regime is challenging, and Transformer models can scale poorly for long sequences. We propose FoldFormer that combines several novel elements - time-to-latent folding, seasonal attention, and convolutions over FFT representations - as a new solution for FGLT forecasting of FaaS function requests. FoldFormer is designed to efficiently consume very fine-grained multi-day data with nearly no additional model, memory, or compute overhead, when compared to consuming coarse-grained data. We show either state-of-the-art or competitive performance for per-minute function requests on the top 5 most requested functions for three data sources, including two in-house Huawei Cloud sources and Azure 2019. We also show state-of-the-art performance at per-second granularity --- a regime that critically limits most other methods.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130379693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can Fair Federated Learning Reduce the need for Personalisation? 公平的联邦学习能减少对个性化的需求吗?

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-04 DOI: 10.1145/3578356.3592592

Alexandru Iacob, Pedro Gusmão, Nicholas D. Lane

{"title":"Can Fair Federated Learning Reduce the need for Personalisation?","authors":"Alexandru Iacob, Pedro Gusmão, Nicholas D. Lane","doi":"10.1145/3578356.3592592","DOIUrl":"https://doi.org/10.1145/3578356.3592592","url":null,"abstract":"Federated Learning (FL) enables training ML models on edge clients without sharing data. However, the federated model's performance on local data varies, disincentivising the participation of clients who benefit little from FL. Fair FL reduces accuracy disparity by focusing on clients with higher losses while personalisation locally fine-tunes the model. Personalisation provides a participation incentive when an FL model underperforms relative to one trained locally. For situations where the federated model provides a lower accuracy than a model trained entirely locally by a client, personalisation improves the accuracy of the pre-trained federated weights to be similar to or exceed those of the local client model. This paper evaluates two Fair FL (FFL) algorithms as starting points for personalisation. Our results show that FFL provides no benefit to relative performance in a language task and may double the number of underperforming clients for an image task. Instead, we propose Personalisation-aware Federated Learning (PaFL) as a paradigm that pre-emptively uses personalisation losses during training. Our technique shows a 50% reduction in the number of underperforming clients for the language task while lowering the number of underperforming clients in the image task instead of doubling it. Thus, evidence indicates that it may allow a broader set of devices to benefit from FL and represents a promising avenue for future experimentation and theoretical analysis.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126869642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal fault localisation in dataflow systems 数据流系统中的因果故障定位

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-24 DOI: 10.48550/arXiv.2304.11987

Andrei Paleyes, Neil D. Lawrence

引用次数: 1

Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems 协调高准确度、低成本效益与低延迟的推理服务系统

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-21 DOI: 10.1145/3578356.3592578

Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, M. Muhlhauser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharif Iran University of Science, Technology, Queen Mary University London, Technical University of Darmstadt, U. O. N. Carolina

{"title":"Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems","authors":"Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, M. Muhlhauser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharif Iran University of Science, Technology, Queen Mary University London, Technical University of Darmstadt, U. O. N. Carolina","doi":"10.1145/3578356.3592578","DOIUrl":"https://doi.org/10.1145/3578356.3592578","url":null,"abstract":"The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations or wasted computing resources. Adapting to dynamic workloads considering all the pillars of accuracy, latency, and resource cost is challenging. In response to these challenges, we propose InfAdapter, which proactively selects a set of ML model variants with their resource allocations to meet latency SLO while maximizing an objective function composed of accuracy and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%, respectively, compared to a popular industry autoscaler (Kubernetes Vertical Pod Autoscaler).","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133208797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Decentralized Learning Made Easy with DecentralizePy 使用DecentralizePy使分散学习变得容易

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-17 DOI: 10.1145/3578356.3592587

Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic

引用次数: 2

Gradient-less Federated Gradient Boosting Tree with Learnable Learning Rates 具有可学习学习率的无梯度联邦梯度提升树

Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-15 DOI: 10.1145/3578356.3592579

Chenyang Ma, Xinchi Qiu, Daniel J. Beutel, N. Lane

引用次数: 3