{"title":"循环神经网络加速器中模型级并行性的开发","authors":"Lu Peng, Wentao Shi, Jian Zhang, Samuel Irving","doi":"10.1109/MCSoC.2019.00042","DOIUrl":null,"url":null,"abstract":"Recurrent Neural Networks (RNNs) have continued to facilitate rapid progress in a variety of academic and industrial fields, though their complexity continues to make efficient deployment difficult; when the RNN model size is not properly matched to hardware resources, performance can suffer from hardware under-utilization. In this work, we propose to explore model-level parallelism for LSTM-RNN accelerators in different levels of the model using a multicore design. The multi-core design proposed in this work operates in three computing modes: multi-programming mode in which independent models are executed; multithreading mode in which parallelism among layers of an LSTM model is explored and properly scheduled; and helper-core mode in which cores collaborate on a single LSTM layer in a lower model level comparing with multithread mode. Our design can achieve up to 1.98x speedup in \"multi-programming\" mode, a 1.91x speedup in \"multithreading\" mode and a 1.88x speedup in \"helper-core\" mode over the single-core design.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators\",\"authors\":\"Lu Peng, Wentao Shi, Jian Zhang, Samuel Irving\",\"doi\":\"10.1109/MCSoC.2019.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recurrent Neural Networks (RNNs) have continued to facilitate rapid progress in a variety of academic and industrial fields, though their complexity continues to make efficient deployment difficult; when the RNN model size is not properly matched to hardware resources, performance can suffer from hardware under-utilization. In this work, we propose to explore model-level parallelism for LSTM-RNN accelerators in different levels of the model using a multicore design. The multi-core design proposed in this work operates in three computing modes: multi-programming mode in which independent models are executed; multithreading mode in which parallelism among layers of an LSTM model is explored and properly scheduled; and helper-core mode in which cores collaborate on a single LSTM layer in a lower model level comparing with multithread mode. Our design can achieve up to 1.98x speedup in \\\"multi-programming\\\" mode, a 1.91x speedup in \\\"multithreading\\\" mode and a 1.88x speedup in \\\"helper-core\\\" mode over the single-core design.\",\"PeriodicalId\":104240,\"journal\":{\"name\":\"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCSoC.2019.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC.2019.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators
Recurrent Neural Networks (RNNs) have continued to facilitate rapid progress in a variety of academic and industrial fields, though their complexity continues to make efficient deployment difficult; when the RNN model size is not properly matched to hardware resources, performance can suffer from hardware under-utilization. In this work, we propose to explore model-level parallelism for LSTM-RNN accelerators in different levels of the model using a multicore design. The multi-core design proposed in this work operates in three computing modes: multi-programming mode in which independent models are executed; multithreading mode in which parallelism among layers of an LSTM model is explored and properly scheduled; and helper-core mode in which cores collaborate on a single LSTM layer in a lower model level comparing with multithread mode. Our design can achieve up to 1.98x speedup in "multi-programming" mode, a 1.91x speedup in "multithreading" mode and a 1.88x speedup in "helper-core" mode over the single-core design.