2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)最新文献

筛选
英文 中文
Work-in-Progress: Ultra-fast yet Accurate Performance Prediction for Deep Neural Network Accelerators 正在进行的工作:深度神经网络加速器的超快速而准确的性能预测
Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, O. Bringmann
{"title":"Work-in-Progress: Ultra-fast yet Accurate Performance Prediction for Deep Neural Network Accelerators","authors":"Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, O. Bringmann","doi":"10.1109/CASES55004.2022.00020","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00020","url":null,"abstract":"We present an automatic methodology to accurately predict the performance of Deep Neural Network (DNN) accelerators using abstract descriptions of accelerator architectures and DNNs with a high degree of flexibility. By mapping partially unrolled neural network layers onto accelerator architectures, we automatically construct an analytical performance model, exploiting the dataflow-driven nature of DNNs that allows us to evaluate only a few loop iterations to determine the performance of a whole DNN layer.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133525942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Work-in-Progress: Toward a Robust, Reconfigurable Hardware Accelerator for Tree-Based Genetic Programming 正在进行的工作:面向基于树的遗传规划的鲁棒,可重构硬件加速器
Christopher Crary, Wesley Piard, Britton Chesley, G. Stitt
{"title":"Work-in-Progress: Toward a Robust, Reconfigurable Hardware Accelerator for Tree-Based Genetic Programming","authors":"Christopher Crary, Wesley Piard, Britton Chesley, G. Stitt","doi":"10.1109/CASES55004.2022.00015","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00015","url":null,"abstract":"Genetic programming (GP) is a general, broadly effective procedure by which computable solutions are constructed from high-level objectives. As with other machine-learning endeavors, one continual trend for GP is to exploit ever-larger amounts of parallelism. In this paper, we explore the possibility of accelerating GP by way of modern field-programmable gate arrays (FPGAs), which is motivated by the fact that FPGAs can sometimes leverage larger amounts of both function and data parallelism—common characteristics of GP— when compared to CPUs and GPUs. As a first step towards more general acceleration, we present a preliminary accelerator for the evaluation phase of \"tree-based GP\"—the original, and still popular, flavor of GP—for which the FPGA dynamically compiles programs of varying shapes and sizes onto a reconfigurable function tree pipeline. Overall, when compared to a recent open-source GPU solution implemented on a modern 8nm process node, our accelerator implemented on an older 20nm FPGA achieves an average speedup of 9.7×. Although our accelerator is 7.9× slower than most examples of a state-of-the-art CPU solution implemented on a recent 7nm process node, we describe future extensions that can make FPGA acceleration provide attractive Pareto-optimal tradeoffs.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115282500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Work-in-Progress: Cooperative MLP-Mixer Networks Inference On Heterogeneous Edge Devices through Partition and Fusion 基于分区和融合的异构边缘设备协同MLP-Mixer网络推理研究
Yiming Li, Shouzhen Gu, Mingsong Chen
{"title":"Work-in-Progress: Cooperative MLP-Mixer Networks Inference On Heterogeneous Edge Devices through Partition and Fusion","authors":"Yiming Li, Shouzhen Gu, Mingsong Chen","doi":"10.1109/CASES55004.2022.00021","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00021","url":null,"abstract":"As a newly proposed DNN architecture, MLP-Mixer is attracting increasing attention due to its competitive results compared to CNNs and attention-base networks in various tasks. Although MLP-Mixer only contains MLP layers, it still suffers from high communication costs in edge computing scenarios, resulting in long inference time. To improve the inference performance of an MLP-Mixer model on correlated resource-constrained heterogeneous edge devices, this paper proposes a novel partition and fusion method specific for MLP-Mixer layers, which can significantly reduce the communication costs. Experimental results show that, when the number of devices increases from 2 to 6, our partition and fusion method can archive 1.01-1.27x and 1.54-3.12x speedup in scenarios with heterogeneous and homogeneous devices, respectively.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127484659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Work-in-Progress: Towards a Smaller than Grain Stream Cipher: Optimized FPGA Implementations of Fruit-80 正在进行的工作:迈向小于颗粒的流密码:Fruit-80的优化FPGA实现
Gangqiang Yang, Zhengyuan Shi, Cheng Chen, Hailiang Xiong, Honggang Hu, Zhiguo Wan, Keke Gai, Meikang Qiu
{"title":"Work-in-Progress: Towards a Smaller than Grain Stream Cipher: Optimized FPGA Implementations of Fruit-80","authors":"Gangqiang Yang, Zhengyuan Shi, Cheng Chen, Hailiang Xiong, Honggang Hu, Zhiguo Wan, Keke Gai, Meikang Qiu","doi":"10.1109/CASES55004.2022.00016","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00016","url":null,"abstract":"Fruit-80, an ultra-lightweight stream cipher with 80-bit secret key, is oriented toward resource constrained devices in the Internet of Things. In this paper, we propose area and speed optimization architectures of Fruit-80 on FPGAs. The area optimization architecture reuses NFSR&LFSR feedback functions and achieves the most suitable ratio of look-up-tables and flip-flops. The speed optimization architecture adopts a hybrid approach for parallelization and reduces the latency of long data paths by pre-generating primary feedback and inserting flip-flops. In conclusion, the optimal throughput-to-area ratio of the speed optimization architecture is better than that of Grain v1. The area optimization architecture occupies only 35 slices on Xilinx Spartan-3 FPGA, smaller than that of Grain and other common stream ciphers. To the best of our knowledge, this result sets a new record of the minimum area in lightweight cipher implementations on FPGA.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126963997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Work-in-Progress: An Open-Source Platform for Design and Programming of Partially Reconfigurable Heterogeneous SoCs 一个用于部分可重构异构soc设计和编程的开源平台
Biruk B. Seyoum, Davide Giri, Kuan-Lin Chiu, L. Carloni
{"title":"Work-in-Progress: An Open-Source Platform for Design and Programming of Partially Reconfigurable Heterogeneous SoCs","authors":"Biruk B. Seyoum, Davide Giri, Kuan-Lin Chiu, L. Carloni","doi":"10.1109/CASES55004.2022.00019","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00019","url":null,"abstract":"Dynamic partial reconfiguration (DPR) enables the design and implementation of flexible, scalable and robust adaptive systems. We present an FPGA-based DPR flow for partially reconfigurable heterogeneous SoCs that uses an incremental compilation technique to reduce the total FPGA compilation time.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124039319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Work-in-Progress: ExpCache: Online-Learning based Cache Replacement Policy for Non-Volatile Memory 正在进行的工作:ExpCache:基于在线学习的非易失性内存缓存替换策略
Jinfeng Yang, Bingzhe Li, Jianjun Yuan, Zhaoyan Shen, H. Du, D. Lilja
{"title":"Work-in-Progress: ExpCache: Online-Learning based Cache Replacement Policy for Non-Volatile Memory","authors":"Jinfeng Yang, Bingzhe Li, Jianjun Yuan, Zhaoyan Shen, H. Du, D. Lilja","doi":"10.1109/CASES55004.2022.00010","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00010","url":null,"abstract":"As emerging memory technologies (e.g., non-volatile memory (NVM)) coming out and machine learning algorithms successfully applying to different fields, the potentials of cache replacement policy for NVM-based systems with the integration of machine learning algorithms are worthy of being exploited to improve the performance of computer systems. In this work, we proposed a machine learning based cache replacement algorithm, named ExpCache, to improve the system performance with NVM as the main memory. By considering the non-volatility characteristic of the NVM devices, we split the whole NVM into two caches, including a read cache and a write cache, for retaining different types of requests. The pages in each cache are managed by both LRU and LFU policies for balancing the recency and frequency of workloads. The online Expert machine learning algorithm is responsible for selecting a proper policy to evict a page from one of the caches based on the access patterns of workloads. In experimental results, the proposed ExpCache outperforms previous studies in terms of hit ratio and the number of dirty pages written back to storage.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"87 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123784095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Evaluation of On-chip Thermal Covert Channel Attacks 片上热隐蔽通道攻击的评估
Jiachen Wang, Xiaohang Wang, Yingtao Jiang, A. Singh, Letian Huang, Mei Yang
{"title":"On Evaluation of On-chip Thermal Covert Channel Attacks","authors":"Jiachen Wang, Xiaohang Wang, Yingtao Jiang, A. Singh, Letian Huang, Mei Yang","doi":"10.1109/CASES55004.2022.00011","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00011","url":null,"abstract":"Thermal covert channel (TCC) attacks have been a serious security concern to the use of many-core chips. Severity of these attacks is directly linked to the TCC’s transmission rate and its BER (bit error rate) performance, both of which are impacted by the transmission characteristics of thermal signals and adopted encoding, modulation, and multiplexing schemes. This paper examines, compares, and analyzes various TCCs built upon different combinations of encoding, modulation, and multiplexing. In particular, our study shows that TCC using non-return-to-zero (NRZ) line coding and frequency shift keying (FSK) modulation achieves the highest throughput of 120 bps and BER of below 10%.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123256442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Welcome Message from the CASES 2022 Program Chairs case 2022项目主持人欢迎辞
{"title":"Welcome Message from the CASES 2022 Program Chairs","authors":"","doi":"10.1109/cases55004.2022.00005","DOIUrl":"https://doi.org/10.1109/cases55004.2022.00005","url":null,"abstract":"","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126279504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Work-in-Progress: CAMiSE: Content Addressable Memory-integrated Searchable Encryption 正在进行的工作:CAMiSE:内容可寻址内存集成的可搜索加密
Arnab Bag, Sikhar Patranabis, Debdeep Mukhopadhyay
{"title":"Work-in-Progress: CAMiSE: Content Addressable Memory-integrated Searchable Encryption","authors":"Arnab Bag, Sikhar Patranabis, Debdeep Mukhopadhyay","doi":"10.1109/CASES55004.2022.00017","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00017","url":null,"abstract":"Searchable symmetric encryption (SSE) aims to support efficient query-execution directly over encrypted databases. Practical implementations of SSE suffer from performance bottlenecks due to randomised memory accesses and computation-intensive cryptographic operations. We propose CAMISE – a fully associative memory-integrated framework for designing SSE systems with fast query processing over large databases. We show a novel usage of custom-designed Content Addressable Memory (CAM) to minimise storage-access latencies during query execution in SSE systems. We prototype a well-known SSE scheme, namely Oblivious Cross Tags (OXT), within this framework. Our implementation achieves 5x-7x speed-up over traditional software-based implementations while scaling smoothly to real-world databases with millions of records.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130681732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Work-in-Progress: NoRF: A Case Against Register File Operands in Tightly-Coupled Accelerators 正在进行的工作:NoRF:紧耦合加速器中寄存器文件操作数的一种情况
David J. Schlais, Heng Zhuo, Mikko H. Lipasti
{"title":"Work-in-Progress: NoRF: A Case Against Register File Operands in Tightly-Coupled Accelerators","authors":"David J. Schlais, Heng Zhuo, Mikko H. Lipasti","doi":"10.1109/CASES55004.2022.00028","DOIUrl":"https://doi.org/10.1109/CASES55004.2022.00028","url":null,"abstract":"Accelerators are often used to increase performance and/or energy efficiency of general-purpose CPUs. However, Tightly-Coupled Accelerators (TCAs) often perform computations on data structures that may not be a natural fit for general-purpose registers. The designer can either use the existing register file (RF), a RF tailored for the accelerator, or eschew a RF entirely (NoRF), accessing operands directly from the memory hierarchy. Designers for embedded and edge devices are particularly conscientious towards energy-efficient compute and data transfer. We explore the possibility of mini-DGEMM accelerators (example TCAs) within the context of CPUs and edge devices, which also have increasing applications for DGEMM compute. At a high level, register files help reduce memory accesses (steps 1, 2, 5, and 6 in Figure 1 ) when the compiler finds reuse of operands in the program dataflow. On the other hand, direct memory access simplifies the data movement by completely eliminating the intermediate reads and writes to a register file but issues more memory requests. This paper evaluates the difference between these options of operand delivery. Figure 2 shows that all recent vector extensions use a register file implementation. By this trend, it may seem natural to incorporate mini-matrices into the RF. However, we present quantitative and qualitative evidence to advocate for direct cache access for operands.","PeriodicalId":331181,"journal":{"name":"2022 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126900697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信