Armin Esmaeilzadeh, Mina Esmail Zadeh Nojoo Kambar, Maryam Heidari
{"title":"图注意力神经网络分布式模型训练","authors":"Armin Esmaeilzadeh, Mina Esmail Zadeh Nojoo Kambar, Maryam Heidari","doi":"10.1109/aiiot54504.2022.9817156","DOIUrl":null,"url":null,"abstract":"The scale of neural language models has been increasing significantly over recent years. As a result, the time complexity of training larger language models and resource utilization has been increasing at a higher rate as well. In this research, we propose a distributed implementation of a Graph Attention Neural Network model with 120 million parameters and train it on a cluster of eight GPUs. We demonstrate three times speedup in model training while keeping the stability of accuracy and loss rates during training and testing compared to single GPU instance training.","PeriodicalId":409264,"journal":{"name":"2022 IEEE World AI IoT Congress (AIIoT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph Attention Neural Network Distributed Model Training\",\"authors\":\"Armin Esmaeilzadeh, Mina Esmail Zadeh Nojoo Kambar, Maryam Heidari\",\"doi\":\"10.1109/aiiot54504.2022.9817156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The scale of neural language models has been increasing significantly over recent years. As a result, the time complexity of training larger language models and resource utilization has been increasing at a higher rate as well. In this research, we propose a distributed implementation of a Graph Attention Neural Network model with 120 million parameters and train it on a cluster of eight GPUs. We demonstrate three times speedup in model training while keeping the stability of accuracy and loss rates during training and testing compared to single GPU instance training.\",\"PeriodicalId\":409264,\"journal\":{\"name\":\"2022 IEEE World AI IoT Congress (AIIoT)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE World AI IoT Congress (AIIoT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/aiiot54504.2022.9817156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE World AI IoT Congress (AIIoT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/aiiot54504.2022.9817156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Graph Attention Neural Network Distributed Model Training
The scale of neural language models has been increasing significantly over recent years. As a result, the time complexity of training larger language models and resource utilization has been increasing at a higher rate as well. In this research, we propose a distributed implementation of a Graph Attention Neural Network model with 120 million parameters and train it on a cluster of eight GPUs. We demonstrate three times speedup in model training while keeping the stability of accuracy and loss rates during training and testing compared to single GPU instance training.