{"title":"分布式机器学习集成的最大时间跨度最小化任务调度","authors":"Jose Monteiro, Óscar Oliveira, Davide Carneiro","doi":"10.1109/ECICE55674.2022.10042894","DOIUrl":null,"url":null,"abstract":"Machine Learning problems are becoming increasingly complex, mostly due to the size of datasets. Data are also generated at increasing speed, which requires models to be updated regularly, at a significant computational cost. The project Continuously Evolving Distributed Ensembles proposes the creation of a distributed Machine Learning environment, in which datasets are divided into fixed-size blocks, and stored in a fault-tolerant distributed file system with replication. The base-models of the Ensembles, with a 1:1 relationship with data blocks, are then trained in a distributed manner, according to the principle of data locality. Specifically, the system is able to select which data blocks to use and in which nodes of the cluster, in order to minimize training time. A similar process takes place when making predictions: the best base-models are selected in real-time, according to their predictive performance and to the state of the nodes where they reside. This paper addresses the problem of assigning base model training tasks to cluster nodes, adhering to the principle of data locality. We present an instance generator and three datasets that will provide a means for comparison while studying other solution methods. For testing the system architecture, we solved the datasets with an exact method and the computational results validate, to comply to the project requirements, the need for a more stable and less demanding (in computational resource terms) solution method.","PeriodicalId":282635,"journal":{"name":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Task Scheduling with Makespan Minimization for Distributed Machine Learning Ensembles\",\"authors\":\"Jose Monteiro, Óscar Oliveira, Davide Carneiro\",\"doi\":\"10.1109/ECICE55674.2022.10042894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine Learning problems are becoming increasingly complex, mostly due to the size of datasets. Data are also generated at increasing speed, which requires models to be updated regularly, at a significant computational cost. The project Continuously Evolving Distributed Ensembles proposes the creation of a distributed Machine Learning environment, in which datasets are divided into fixed-size blocks, and stored in a fault-tolerant distributed file system with replication. The base-models of the Ensembles, with a 1:1 relationship with data blocks, are then trained in a distributed manner, according to the principle of data locality. Specifically, the system is able to select which data blocks to use and in which nodes of the cluster, in order to minimize training time. A similar process takes place when making predictions: the best base-models are selected in real-time, according to their predictive performance and to the state of the nodes where they reside. This paper addresses the problem of assigning base model training tasks to cluster nodes, adhering to the principle of data locality. We present an instance generator and three datasets that will provide a means for comparison while studying other solution methods. For testing the system architecture, we solved the datasets with an exact method and the computational results validate, to comply to the project requirements, the need for a more stable and less demanding (in computational resource terms) solution method.\",\"PeriodicalId\":282635,\"journal\":{\"name\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECICE55674.2022.10042894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECICE55674.2022.10042894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Task Scheduling with Makespan Minimization for Distributed Machine Learning Ensembles
Machine Learning problems are becoming increasingly complex, mostly due to the size of datasets. Data are also generated at increasing speed, which requires models to be updated regularly, at a significant computational cost. The project Continuously Evolving Distributed Ensembles proposes the creation of a distributed Machine Learning environment, in which datasets are divided into fixed-size blocks, and stored in a fault-tolerant distributed file system with replication. The base-models of the Ensembles, with a 1:1 relationship with data blocks, are then trained in a distributed manner, according to the principle of data locality. Specifically, the system is able to select which data blocks to use and in which nodes of the cluster, in order to minimize training time. A similar process takes place when making predictions: the best base-models are selected in real-time, according to their predictive performance and to the state of the nodes where they reside. This paper addresses the problem of assigning base model training tasks to cluster nodes, adhering to the principle of data locality. We present an instance generator and three datasets that will provide a means for comparison while studying other solution methods. For testing the system architecture, we solved the datasets with an exact method and the computational results validate, to comply to the project requirements, the need for a more stable and less demanding (in computational resource terms) solution method.