Proceedings of the 3rd Workshop on Machine Learning and Systems最新文献

筛选
英文 中文
Best of both, Structured and Unstructured Sparsity in Neural Networks 神经网络中的结构化和非结构化稀疏性
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592583
Christoph Schulte, Sven Wagner, Armin Runge, Dimitrios Bariamis, B. Hammer
{"title":"Best of both, Structured and Unstructured Sparsity in Neural Networks","authors":"Christoph Schulte, Sven Wagner, Armin Runge, Dimitrios Bariamis, B. Hammer","doi":"10.1145/3578356.3592583","DOIUrl":"https://doi.org/10.1145/3578356.3592583","url":null,"abstract":"Besides quantization, pruning has shown to be one of the most effective methods to reduce the inference time and required energy of Deep Neural Networks (DNNs). In this work, we propose a sparsity definition that reflects the number of saved operations by pruned parameters to guide the pruning process in order to save as many operations as possible. Based on this, we show the importance of the baseline model's size and quantify the overhead of unstructured sparsity for a commercial-of-the-shelf AI Hardware Accelerator (HWA) in terms of latency reductions. Furthermore, we show that a combination of both structured and unstructured sparsity can mitigate this effect.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Model Training: Performance Antipatterns Eliminator Framework 加速模型训练:性能反模式消除器框架
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592596
R. Singh, Mayank Mishra, Rekha Singhal
{"title":"Accelerating Model Training: Performance Antipatterns Eliminator Framework","authors":"R. Singh, Mayank Mishra, Rekha Singhal","doi":"10.1145/3578356.3592596","DOIUrl":"https://doi.org/10.1145/3578356.3592596","url":null,"abstract":"In the realm of ML/DL training pipelines, the training-specific data preparation of complex models may consume up to 87% of the total training time. A data scientist may build training pipelines using Python data structures on GPU while being unaware of the performance antipatterns that arise due to communication between CPU and GPU during model training, etc. These antipatterns may not be easily identifiable using traditional profiling tools alone. In this paper, we propose Performance Antipatterns Eliminator Framework (PAEF), a framework to identify six performance antipatterns occurring due to data movements between CPU and GPU during training. Our framework co-relates profiles of CPU and GPU executions of the pipeline along with the static analysis of the code to identify the performance antipatterns. We further replace these antipatterns with their performant versions. We evaluate the benefits of PAEF for two industrial recommendation models, where we showcase up to 7X speedup by using PAEF over the original pipeline.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129682230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation 浅谈蒸馏超参数对联邦知识蒸馏的影响
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592590
Norah Alballa, M. Canini
{"title":"A First Look at the Impact of Distillation Hyper-Parameters in Federated Knowledge Distillation","authors":"Norah Alballa, M. Canini","doi":"10.1145/3578356.3592590","DOIUrl":"https://doi.org/10.1145/3578356.3592590","url":null,"abstract":"Knowledge distillation has been known as a useful way for model compression. It has been recently adopted in the distributed training domain, such as federated learning, as a way to transfer knowledge between already pre-trained models. Knowledge distillation in distributed settings promises advantages, including significantly reducing the communication overhead and allowing heterogeneous model architectures. However, distillation is still not well studied and understood in such settings, which hinders the possible gains. We bridge this gap by performing an experimental analysis of the distillation process in the distributed training setting, mainly with non-IID data. We highlight some elements that require special considerations when transferring knowledge between already pre-trained models: the transfer set, the temperature, the weight, and the positioning. Appropriately tuning these hyper-parameters can remarkably boost learning outcomes. In our experiments, around two-thirds of the participants require settings other than commonly used default settings in literature, and appropriate tuning can reach more than five times improvement on average.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127848419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Profiling and Monitoring Deep Learning Training Tasks 分析和监控深度学习训练任务
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592589
Ehsan Yousefzadeh-Asl-Miandoab, Ties Robroek, Pinar Tozun
{"title":"Profiling and Monitoring Deep Learning Training Tasks","authors":"Ehsan Yousefzadeh-Asl-Miandoab, Ties Robroek, Pinar Tozun","doi":"10.1145/3578356.3592589","DOIUrl":"https://doi.org/10.1145/3578356.3592589","url":null,"abstract":"The embarrassingly parallel nature of deep learning training tasks makes CPU-GPU co-processors the primary commodity hardware for them. The computing and memory requirements of these tasks, however, do not always align well with the available GPU resources. It is, therefore, important to monitor and profile the behavior of training tasks on co-processors to understand better the requirements of different use cases. In this paper, our goal is to shed more light on the variety of tools for profiling and monitoring deep learning training tasks on server-grade NVIDIA GPUs. In addition to surveying the main characteristics of the tools, we analyze the functional limitations and overheads of each tool by using a both light and heavy training scenario. Our results show that monitoring tools like nvidia-smi and dcgm can be integrated with resource managers for online decision making thanks to their low overheads. On the other hand, one has to be careful about the set of metrics to correctly reason about the GPU utilization. When it comes to profiling, each tool has its time to shine; a framework-based or system-wide GPU profiler can first detect the frequent kernels or bottlenecks, and then, a lower-level GPU profiler can focus on particular kernels at the micro-architectural-level.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126000073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FoldFormer: sequence folding and seasonal attention for fine-grained long-term FaaS forecasting FoldFormer:用于细粒度长期FaaS预测的序列折叠和季节性关注
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-08 DOI: 10.1145/3578356.3592582
L. N. Darlow, Artjom Joosen, Martin Asenov, Qiwen Deng, Jianfeng Wang, Adam Barker
{"title":"FoldFormer: sequence folding and seasonal attention for fine-grained long-term FaaS forecasting","authors":"L. N. Darlow, Artjom Joosen, Martin Asenov, Qiwen Deng, Jianfeng Wang, Adam Barker","doi":"10.1145/3578356.3592582","DOIUrl":"https://doi.org/10.1145/3578356.3592582","url":null,"abstract":"Fine-grained long-term (FGLT) time series forecasting is a fundamental challenge in Function as a Service (FaaS) platforms. The data that FaaS function requests produce are fine-grained (per-second/minute), often have daily periodicity, and are persistent over the long term. Forecasting in the FGLT data regime is challenging, and Transformer models can scale poorly for long sequences. We propose FoldFormer that combines several novel elements - time-to-latent folding, seasonal attention, and convolutions over FFT representations - as a new solution for FGLT forecasting of FaaS function requests. FoldFormer is designed to efficiently consume very fine-grained multi-day data with nearly no additional model, memory, or compute overhead, when compared to consuming coarse-grained data. We show either state-of-the-art or competitive performance for per-minute function requests on the top 5 most requested functions for three data sources, including two in-house Huawei Cloud sources and Azure 2019. We also show state-of-the-art performance at per-second granularity --- a regime that critically limits most other methods.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130379693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Fair Federated Learning Reduce the need for Personalisation? 公平的联邦学习能减少对个性化的需求吗?
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-05-04 DOI: 10.1145/3578356.3592592
Alexandru Iacob, Pedro Gusmão, Nicholas D. Lane
{"title":"Can Fair Federated Learning Reduce the need for Personalisation?","authors":"Alexandru Iacob, Pedro Gusmão, Nicholas D. Lane","doi":"10.1145/3578356.3592592","DOIUrl":"https://doi.org/10.1145/3578356.3592592","url":null,"abstract":"Federated Learning (FL) enables training ML models on edge clients without sharing data. However, the federated model's performance on local data varies, disincentivising the participation of clients who benefit little from FL. Fair FL reduces accuracy disparity by focusing on clients with higher losses while personalisation locally fine-tunes the model. Personalisation provides a participation incentive when an FL model underperforms relative to one trained locally. For situations where the federated model provides a lower accuracy than a model trained entirely locally by a client, personalisation improves the accuracy of the pre-trained federated weights to be similar to or exceed those of the local client model. This paper evaluates two Fair FL (FFL) algorithms as starting points for personalisation. Our results show that FFL provides no benefit to relative performance in a language task and may double the number of underperforming clients for an image task. Instead, we propose Personalisation-aware Federated Learning (PaFL) as a paradigm that pre-emptively uses personalisation losses during training. Our technique shows a 50% reduction in the number of underperforming clients for the language task while lowering the number of underperforming clients in the image task instead of doubling it. Thus, evidence indicates that it may allow a broader set of devices to benefit from FL and represents a promising avenue for future experimentation and theoretical analysis.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126869642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal fault localisation in dataflow systems 数据流系统中的因果故障定位
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-24 DOI: 10.48550/arXiv.2304.11987
Andrei Paleyes, Neil D. Lawrence
{"title":"Causal fault localisation in dataflow systems","authors":"Andrei Paleyes, Neil D. Lawrence","doi":"10.48550/arXiv.2304.11987","DOIUrl":"https://doi.org/10.48550/arXiv.2304.11987","url":null,"abstract":"Dataflow computing was shown to bring significant benefits to multiple niches of systems engineering and has the potential to become a general-purpose paradigm of choice for data-driven application development. One of the characteristic features of dataflow computing is the natural access to the dataflow graph of the entire system. Recently it has been observed that these dataflow graphs can be treated as complete graphical causal models, opening opportunities to apply causal inference techniques to dataflow systems. In this demonstration paper we aim to provide the first practical validation of this idea with a particular focus on causal fault localisation. We provide multiple demonstrations of how causal inference can be used to detect software bugs and data shifts in multiple scenarios with three modern dataflow engines.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"433 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121250184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems 协调高准确度、低成本效益与低延迟的推理服务系统
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-21 DOI: 10.1145/3578356.3592578
Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, M. Muhlhauser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharif Iran University of Science, Technology, Queen Mary University London, Technical University of Darmstadt, U. O. N. Carolina
{"title":"Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems","authors":"Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, M. Muhlhauser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharif Iran University of Science, Technology, Queen Mary University London, Technical University of Darmstadt, U. O. N. Carolina","doi":"10.1145/3578356.3592578","DOIUrl":"https://doi.org/10.1145/3578356.3592578","url":null,"abstract":"The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations or wasted computing resources. Adapting to dynamic workloads considering all the pillars of accuracy, latency, and resource cost is challenging. In response to these challenges, we propose InfAdapter, which proactively selects a set of ML model variants with their resource allocations to meet latency SLO while maximizing an objective function composed of accuracy and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%, respectively, compared to a popular industry autoscaler (Kubernetes Vertical Pod Autoscaler).","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133208797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Decentralized Learning Made Easy with DecentralizePy 使用DecentralizePy使分散学习变得容易
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-17 DOI: 10.1145/3578356.3592587
Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic
{"title":"Decentralized Learning Made Easy with DecentralizePy","authors":"Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic","doi":"10.1145/3578356.3592587","DOIUrl":"https://doi.org/10.1145/3578356.3592587","url":null,"abstract":"Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose decentralizepy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of decentralizepy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gradient-less Federated Gradient Boosting Tree with Learnable Learning Rates 具有可学习学习率的无梯度联邦梯度提升树
Proceedings of the 3rd Workshop on Machine Learning and Systems Pub Date : 2023-04-15 DOI: 10.1145/3578356.3592579
Chenyang Ma, Xinchi Qiu, Daniel J. Beutel, N. Lane
{"title":"Gradient-less Federated Gradient Boosting Tree with Learnable Learning Rates","authors":"Chenyang Ma, Xinchi Qiu, Daniel J. Beutel, N. Lane","doi":"10.1145/3578356.3592579","DOIUrl":"https://doi.org/10.1145/3578356.3592579","url":null,"abstract":"The privacy-sensitive nature of decentralized datasets and the robustness of eXtreme Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context of federated learning (FL). Existing works on federated XGBoost in the horizontal setting rely on the sharing of gradients, which induce per-node level communication frequency and serious privacy concerns. To alleviate these problems, we develop an innovative framework for horizontal federated XGBoost which does not depend on the sharing of gradients and simultaneously boosts privacy and communication efficiency by making the learning rates of the aggregated tree ensembles learnable. We conduct extensive evaluations on various classification and regression datasets, showing our approach achieves performance comparable to the state-of-the-art method and effectively improves communication efficiency by lowering both communication rounds and communication overhead by factors ranging from 25x to 700x.","PeriodicalId":370204,"journal":{"name":"Proceedings of the 3rd Workshop on Machine Learning and Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123611436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信