Arko Dutt, G. Narasimman, Lin Jie, V. Chandrasekhar, M. Sabry
{"title":"Work-in-Progress: EAST-DNN: Expediting Architectural SimulaTions Using Deep Neural Networks","authors":"Arko Dutt, G. Narasimman, Lin Jie, V. Chandrasekhar, M. Sabry","doi":"10.1145/3349567.3351728","DOIUrl":"https://doi.org/10.1145/3349567.3351728","url":null,"abstract":"A rapid and accurate architectural simulator is a cornerstone for an efficient design-space exploration of computing systems. In this paper, we introduce EAST-DNN, a feed-forward deep neural network, to accelerate architectural simulations. EAST-DNN achieves $> 10^{6}times$ speedup with an average prediction error of 4.3% over the baseline simulator. It also achieves an average of $2times$ better accuracy with at least $2.3times$ speedup compared to state-of-the-art.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121264076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: Q-Learning Based Routing for Transiently Powered Wireless Sensor Network","authors":"Zhenge Jia, Yawen Wu, J. Hu","doi":"10.1145/3349567.3351732","DOIUrl":"https://doi.org/10.1145/3349567.3351732","url":null,"abstract":"Reliable communication is a critical concern in power-limited energy harvesting wireless sensor networks (EH-WSNs). The communication optimization is needed since the protocols in battery-powered WSNs cannot adapt to the intermittent harvestable energy sources. In this paper, a novel reinforcement learning (RL) based routing algorithm that fully exploits the capability of wake-up radio (WuR) is presented. This routing strategy aims at increasing the packet delivery rate by leveraging wake-up radio devices to enable receiver nodes to make the decentralized forwarding decision. Simulation results show that the performance of the proposed learning approach, which requires only limited knowledge of the energy harvesting process, has only a small degradation compared to the optimal routing decision with full knowledge of energy harvesting process.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114435931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: Cooperative Communication Between Two Transiently Powered Sensors by Reinforcement Learning","authors":"Yawen Wu, Zhenge Jia, Fei Fang, J. Hu","doi":"10.1145/3349567.3351723","DOIUrl":"https://doi.org/10.1145/3349567.3351723","url":null,"abstract":"The transmission between two energy harvesting (EH) powered sensors is successful only when both sensors have enough energy at the same time. Given the scarce, unpredictable, and unevenly distributed energy between two sensors, it is challenging to ensure efficient data transmission. We propose a sensor node architecture with multiple wake-up radios, each with a different ratio of energy consumption on the transmitter and receiver. Two sensors cooperatively select wake-up radios to maximize data throughput. The communication procedure is modeled as a cooperative Markov game with partial observability and multi-agent reinforcement learning (MARL) is employed to maximize the throughput. The proposed methods achieve near-optimal data throughput.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125607482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: Offloading Cache Configuration Prediction to an FPGA for Hardware Speedup and Overhead Reduction","authors":"Ruben Vazquez, A. Gordon-Ross, G. Stitt","doi":"10.1145/3349567.3351722","DOIUrl":"https://doi.org/10.1145/3349567.3351722","url":null,"abstract":"In this paper, we present our cache configuration prediction methodology offloaded to an FPGA for improved performance and hardware overhead reduction, while maintaining cache configuration predictions within 5% of the optimal energy cache configuration for application phases for the instruction and data caches.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115955751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eberle A. Rambo, Thawra Kadeed, R. Ernst, Minjun Seo, F. Kurdahi, Bryan Donyanavard, Caio Batista de Melo, Biswadip Maity, Kasra Moazzemi, Kenneth Stewart, Saehanseul Yi, A. Rahmani, N. Dutt, F. Maurer, N. Doan, A. Surhonne, T. Wild, A. Herkersdorf
{"title":"The Information Processing Factory: A Paradigm for Life Cycle Management of Dependable Systems","authors":"Eberle A. Rambo, Thawra Kadeed, R. Ernst, Minjun Seo, F. Kurdahi, Bryan Donyanavard, Caio Batista de Melo, Biswadip Maity, Kasra Moazzemi, Kenneth Stewart, Saehanseul Yi, A. Rahmani, N. Dutt, F. Maurer, N. Doan, A. Surhonne, T. Wild, A. Herkersdorf","doi":"10.1145/3349567.3357391","DOIUrl":"https://doi.org/10.1145/3349567.3357391","url":null,"abstract":"The number and complexity of embedded system platforms used in mixed-criticality applications are rapidly growing. They run large and evolving applications on heterogeneous multi- or manycore processing platforms requiring dependable operation and long lifetime. Examples include automated and autonomous driving, smart buildings, industry 4.0, and personal medical devices. The Information Processing Factory (IPF) applies principles inspired by factory management to master the complexity of future, highly- integrated embedded systems and to provide continuous operation and optimization at runtime. A general objective is to identify a sweet spot between a maximum of autonomy among IPF constituent components and a minimum of centralized control in order to ensure guaranteed service even under strict safety and availability requirements. This paper addresses the challenges of IPF and how to tackle them with a set of techniques: self-diagnosis for early detection of degradation and imminent failures combined with unsupervised platform self-adaptation to meet performance and safety targets.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: Mitigating Inter-Channel Crosstalk Non-Uniformity in Microring Filter Arrays of Photonic NoCs","authors":"Venkata Sai Praneeth Karempudi, Ishan G. Thakkar","doi":"10.1145/3349567.3351729","DOIUrl":"https://doi.org/10.1145/3349567.3351729","url":null,"abstract":"Photonic networks-on-chip (PNoCs) employ photonic links with dense-wavelength-division-multiplexing (DWDM) of channels for parallel signal traversal, along with arrayed microring resonator (MR) filters for parallel signal reception, to enable high-bandwidth on-chip data transfers. Unfortunately, DWDM induces nonuniform inter-channel crosstalk in an MR filter array, which degrades the communication reliability in the link. Overcoming this reliability degradation requires non-uniformly distributed signal power across the utilized data-channels in the link. This increases the total laser power consumption of the link, compared to the ideal case where the crosstalk distribution in the MR filter array is uniform. This paper presents a novel design of MR filter array with minimized crosstalk non-uniformity, which can achieve total optical laser power savings of up to 34% of the link power budget.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117160249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Bogdan, Fan Chen, Aryan Deshwal, J. Doppa, Biresh Kumar Joardar, H. Li, Shahin Nazarian, Linghao Song, Yao Xiao
{"title":"Taming Extreme Heterogeneity via Machine Learning based Design of Autonomous Manycore Systems","authors":"P. Bogdan, Fan Chen, Aryan Deshwal, J. Doppa, Biresh Kumar Joardar, H. Li, Shahin Nazarian, Linghao Song, Yao Xiao","doi":"10.1145/3349567.3357376","DOIUrl":"https://doi.org/10.1145/3349567.3357376","url":null,"abstract":"To avoid rewriting software code for new computer architectures and to take advantage of the extreme heterogeneous processing, communication and storage technologies, there is an urgent need for determining the right amount and type of specialization while making a heterogeneous system as programmable and flexible as possible. To enable both programmability and flexibility in the heterogeneous computing era, we propose a novel complex network inspired model of computation and efficient optimization algorithms for determining the optimal degree of parallelization from old software code. This mathematical framework allows us to determine the required number and type of processing elements, the amount and type of deep memory hierarchy, and the degree of reconfiguration for the communication infrastructure, thus opening new avenues to performance and energy efficiency. Our framework enables heterogeneous manycore systems to autonomously adapt from traditional switching techniques to network coding strategies in order to sustain on-chip communication in the order of terabytes. While this new programming model enables the design of self-programmable autonomous heterogeneous manycore systems, a number of open challenges will be discussed.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130090153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: A SIMD-Aware Pruning Technique for Convolutional Neural Networks with Multi-Sparsity Levels","authors":"Jeonggyu Jang, Kyusik Choi, Hoeseok Yang","doi":"10.1145/3349567.3351718","DOIUrl":"https://doi.org/10.1145/3349567.3351718","url":null,"abstract":"Designs of electronic systems often require considering multiple design concerns. In this paper, we propose a novel multi-phase pruning technique for convolutional neural networks (CNNs) that is capable of efficient exploration of multiple design objectives and constraints. To truly take advantage of the sparsity obtained by pruning, we present two different levels of pruning granularity, fine- and coarse-grain, and show how they are combined in the design space exploration. In particular, we propose to take the SIMD architecture into account in the fine-grain pruning. By iteratively pruning to a single CNN, multiple candidates can be obtained from the trade-off between the given design concerns. Experiments with existing CNNs verify that the proposed technique enables more efficient design space exploration over the accuracy-speed trade-off.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"18 22","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113984548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: BPNet: Branch-pruned Conditional Neural Network for Systematic Time-accuracy Tradeoff in DNN Inference","authors":"Kyungchul Park, Youngmin Yi","doi":"10.1145/3349567.3351721","DOIUrl":"https://doi.org/10.1145/3349567.3351721","url":null,"abstract":"Recently, there have been attempts to execute the neural network conditionally with auxiliary classifiers allowing early termination depending on the difficulty of the input, which can reduce the execution time or energy consumption without any or with negligible accuracy decrease. However, these studies do not consider how many or where the auxiliary classifiers, or branches, should be added in a systematic fashion. In this paper, we propose Branch-pruned Conditional Neural Network (BPNet) and its methodology in which the time-accuracy tradeoff for the conditional neural network can be found systematically. We applied BPNet to SqueezeNet, ResNet-20, and VGG-16 with CIFAR-10 and 100. BPNet achieves on average 2.0x of speedups without any accuracy drop on average compared to the base network.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128618802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: Design Space Exploration of Multi-Task Processing on Space Shared FPGAs","authors":"U. Minhas, R. Woods, G. Karakonstantis","doi":"10.1145/3349567.3351724","DOIUrl":"https://doi.org/10.1145/3349567.3351724","url":null,"abstract":"High level synthesis frameworks, such as OpenCL, allow effective design space exploration by scaling of resource allocation via simple to use tunable parameters. The same process can be supported in multi-task processing but long synthesis time hinders system analysis and resource management optimization. This work proposes a methodology for simulation of multi-task processing on FPGA. In doing so, it also supports static spatial partitioning of resources along with a simulator to evaluate this approach. The simulator is based on a multidimensional resource fitting model for spatial evaluation and a machine learning based model for memory access. The results show that the simulator has an accuracy of at least 94.5% on average for throughput evaluation while allowing system design evaluation against various parameters.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"144 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114118074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}