{"title":"基于异步连接的分布式随机梯度下降算法DAC-SGD","authors":"Aijia He, Zehong Chen, Weichen Li, Xingying Li, Hongjun Li, Xin Zhao","doi":"10.1145/3144789.3144815","DOIUrl":null,"url":null,"abstract":"In the data mining practice, it happens that the algorithm used in mining tasks needs to deal with the multiple distributed data source, while the required datasets are located in different companies or organizations and reside in different system and technology environments. In traditional mining solutions or algorithms, data located in different source need to be copied and integrated into a homogenous computation environment, and then the mining can be executed, which leads to large data transmission and high storage costs. Even the data mining can be in feasible due to the data ownership problems. In this paper, a distributed asynchronous connection approach for the well-used stochastic gradient descent algorithm (SGD) was presented, and a distributed implementation for it was done to cope with the multiple distributed data source problems. In which, the main process of the algorithm was executed asynchronously in distributed computation node and the model can be trained locally in multiple data sources based on their own computation environment, so as to avoid the data integration and centralized processing. And the feasibility and performance for the proposed algorithm was evaluated based on experimental studies.","PeriodicalId":254163,"journal":{"name":"Proceedings of the 2nd International Conference on Intelligent Information Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DAC-SGD: A Distributed Stochastic Gradient Descent Algorithm Based on Asynchronous Connection\",\"authors\":\"Aijia He, Zehong Chen, Weichen Li, Xingying Li, Hongjun Li, Xin Zhao\",\"doi\":\"10.1145/3144789.3144815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the data mining practice, it happens that the algorithm used in mining tasks needs to deal with the multiple distributed data source, while the required datasets are located in different companies or organizations and reside in different system and technology environments. In traditional mining solutions or algorithms, data located in different source need to be copied and integrated into a homogenous computation environment, and then the mining can be executed, which leads to large data transmission and high storage costs. Even the data mining can be in feasible due to the data ownership problems. In this paper, a distributed asynchronous connection approach for the well-used stochastic gradient descent algorithm (SGD) was presented, and a distributed implementation for it was done to cope with the multiple distributed data source problems. In which, the main process of the algorithm was executed asynchronously in distributed computation node and the model can be trained locally in multiple data sources based on their own computation environment, so as to avoid the data integration and centralized processing. And the feasibility and performance for the proposed algorithm was evaluated based on experimental studies.\",\"PeriodicalId\":254163,\"journal\":{\"name\":\"Proceedings of the 2nd International Conference on Intelligent Information Processing\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd International Conference on Intelligent Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3144789.3144815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Intelligent Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3144789.3144815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DAC-SGD: A Distributed Stochastic Gradient Descent Algorithm Based on Asynchronous Connection
In the data mining practice, it happens that the algorithm used in mining tasks needs to deal with the multiple distributed data source, while the required datasets are located in different companies or organizations and reside in different system and technology environments. In traditional mining solutions or algorithms, data located in different source need to be copied and integrated into a homogenous computation environment, and then the mining can be executed, which leads to large data transmission and high storage costs. Even the data mining can be in feasible due to the data ownership problems. In this paper, a distributed asynchronous connection approach for the well-used stochastic gradient descent algorithm (SGD) was presented, and a distributed implementation for it was done to cope with the multiple distributed data source problems. In which, the main process of the algorithm was executed asynchronously in distributed computation node and the model can be trained locally in multiple data sources based on their own computation environment, so as to avoid the data integration and centralized processing. And the feasibility and performance for the proposed algorithm was evaluated based on experimental studies.