Proceedings of the 17th ACM International Conference on Computing Frontiers最新文献

筛选
英文 中文
Contention-aware application performance prediction for disaggregated memory systems 面向分解内存系统的竞争感知应用程序性能预测
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392625
F. V. Zacarias, Rajiv Nishtala, P. Carpenter
{"title":"Contention-aware application performance prediction for disaggregated memory systems","authors":"F. V. Zacarias, Rajiv Nishtala, P. Carpenter","doi":"10.1145/3387902.3392625","DOIUrl":"https://doi.org/10.1145/3387902.3392625","url":null,"abstract":"Disaggregated memory has recently been proposed as a way to allow flexible and fine-grained allocation of memory capacity to compute jobs. This paper makes an important step towards effective resource allocation on disaggregated memory systems. Specifically, we propose a generic approach to predict the performance degradation due to sharing of disaggregated memory. In contrast to prior work, cache capacity is not shared among multiple applications, which removes a major contributor to application performance. For this reason, our analysis is driven by the demand for memory bandwidth, which has been shown to have an important effect on application performance. We show that profiling the application slowdown often involves significant experimental error and noise, and to this end, we improve the accuracy by linear smoothing of the sensitivity curves. We also show that contention is sensitive to the ratio between read and write memory accesses, and we address this sensitivity by building a family of sensitivity curves according to the read/write ratios. Our results show that the methodology predicts the slowdown in application performance subject to memory contention with an average error of 1.19% and max error of 14.6%. Compared with state-of-the-art, the relative improvements are almost 24% on average and 33% for the worst case.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131801103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems 基于非相干突发和基于相干缓存线的内存系统之间的开源桥接的设计
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392631
Matheus A. Cavalcante, Andreas Kurth, Fabian Schuiki, L. Benini
{"title":"Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems","authors":"Matheus A. Cavalcante, Andreas Kurth, Fabian Schuiki, L. Benini","doi":"10.1145/3387902.3392631","DOIUrl":"https://doi.org/10.1145/3387902.3392631","url":null,"abstract":"In heterogeneous computer architectures, the serial part of an application is coupled with domain-specific accelerators that promise high computing throughput and efficiency across a wide range of applications. In such systems, the serial part of a program is executed on a Central Processing Unit (CPU) core optimized for single-thread performance, while parallel sections are offloaded to Programmable Manycore Accelerators (PMCAs). This heterogeneity requires CPU cores and PMCAs to share data in memory efficiently, although CPUs rely on a coherent memory system where data is transferred in cache lines, while PMCAs are based on non-coherent scratchpad memories where data is transferred in bursts by DMA engines. In this paper, we tackle the challenges and hardware complexity of bridging the gap from a non-coherent, burst-based memory hierarchy to a coherent, cache-line-based one. We design and implement an open-source hardware module that reaches 97% peak throughput over a wide range of realistic linear algebra kernels and is suited for a wide spectrum of memory architectures. Implemented in a state-of-the-art 22 nm FD-SOI technology, our module bridges up to 650 Gbps at 130 fJ/bit and has a complexity of less than 1 kGE/Gbps.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132660972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
StoneCutter 石匠
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394029
J. Leidel, D. Donofrio, Frank Conlon
{"title":"StoneCutter","authors":"J. Leidel, D. Donofrio, Frank Conlon","doi":"10.1145/3387902.3394029","DOIUrl":"https://doi.org/10.1145/3387902.3394029","url":null,"abstract":"As the density and capability of reconfigurable computing using FPGAs continues to increase and access to large scale ASIC integration continues to increase, research activities associated with high level synthesis flows have expanded at a similar rate. The goal of these research efforts is to reduce the time and effort required to construct and deploy application-specific architectures. However, these synthesis techniques often force users to consider the entire circuit design space in order to develop a successful implementation. This lack of design specificity often results in hardware design implementations that are difficult to program, difficult to reuse in future designs and make sub-optimal use of hardware resources. In this work we introduce the StoneCutter instruction set design language and tool infrastructure. StoneCutter provides a familiar, C-like language construct by which to develop the implementation for individual, programmable instructions. The LLVM-based StoneCutter compiler performs individual instruction and whole-ISA optimizations in order to generate a high performance, Chisel HDL representation of the target design. Utilizing the existing Chisel tools, users can also generate C++ cycle accurate simulation models as well as Verilog representations of the target design. As a result, StoneCutter provides a very rapid design environment for development and experimentation.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125368270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-sliced quantum circuit partitioning for modular architectures 模块化架构的时间切片量子电路划分
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392617
Jonathan M. Baker, Casey Duckering, Alexander P. Hoover, F. Chong
{"title":"Time-sliced quantum circuit partitioning for modular architectures","authors":"Jonathan M. Baker, Casey Duckering, Alexander P. Hoover, F. Chong","doi":"10.1145/3387902.3392617","DOIUrl":"https://doi.org/10.1145/3387902.3392617","url":null,"abstract":"Current quantum computer designs will not scale. To scale beyond small prototypes, quantum architectures will likely adopt a modular approach with clusters of tightly connected quantum bits and sparser connections between clusters. We exploit this clustering and the statically-known control flow of quantum programs to create tractable partitioning heuristics which map quantum circuits to modular physical machines one time slice at a time. Specifically, we create optimized mappings for each time slice, accounting for the cost to move data from the previous time slice and using a tunable lookahead scheme to reduce the cost to move to future time slices. We compare our approach to a traditional statically-mapped, owner-computes model. Our results show strict improvement over the static mapping baseline. We reduce the non-local communication overhead by 89.8% in the best case and by 60.9% on average. Our techniques, unlike many exact solver methods, are computationally tractable.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"609 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121979988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Quantum splines for non-linear approximations 非线性近似的量子样条
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394032
A. Macaluso, L. Clissa, Stefano Lodi, Claudio Sartori
{"title":"Quantum splines for non-linear approximations","authors":"A. Macaluso, L. Clissa, Stefano Lodi, Claudio Sartori","doi":"10.1145/3387902.3394032","DOIUrl":"https://doi.org/10.1145/3387902.3394032","url":null,"abstract":"Quantum Computing offers a new paradigm for efficient computing and many AI applications could benefit from its potential boost in performance. However, the main limitation is the constraint to linear operations that hampers the representation of complex relationships in data. In this work, we propose an efficient implementation of quantum splines for non-linear approximation. In particular, we first discuss possible parametrisations, and select the most convenient for exploiting the HHL algorithm to obtain the estimates of spline coefficients. Then, we investigate QSpline performance as an evaluation routine for some of the most popular activation functions adopted in ML. Finally, a detailed comparison with classical alternatives to the HHL is also presented.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127033646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Freeway 高速公路
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394028
Yifan Shen, Ke Liu, Ziting Guo, Wenli Zhang, Guanghui Zhang, V. Aggarwal, Mingyu Chen
{"title":"Freeway","authors":"Yifan Shen, Ke Liu, Ziting Guo, Wenli Zhang, Guanghui Zhang, V. Aggarwal, Mingyu Chen","doi":"10.1145/3387902.3394028","DOIUrl":"https://doi.org/10.1145/3387902.3394028","url":null,"abstract":"After reading this book, you will really know how exactly the importance of reading books as common. Think once again as what this freeway gives you new lesson, the other books with many themes and genres and million PDFs will also give you same, or more than it. This is why, we always provide what you need and what you need to do. Many collections of the books from not only this country, from abroad a countries in the world are provided here. By providing easy way to help you finding the books, hopefully, reading habit will spread out easily to other people, too.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134351226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enabling mixed-precision quantized neural networks in extreme-edge devices 在极端边缘设备中实现混合精度量化神经网络
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394038
Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, D. Rossi
{"title":"Enabling mixed-precision quantized neural networks in extreme-edge devices","authors":"Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, D. Rossi","doi":"10.1145/3387902.3394038","DOIUrl":"https://doi.org/10.1145/3387902.3394038","url":null,"abstract":"The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132398132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
HiLSM
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392621
Wenjie Li, Dejun Jiang, Jin Xiong, Yungang Bao
{"title":"HiLSM","authors":"Wenjie Li, Dejun Jiang, Jin Xiong, Yungang Bao","doi":"10.1145/3387902.3392621","DOIUrl":"https://doi.org/10.1145/3387902.3392621","url":null,"abstract":"In order to ensure data durability and crash consistency, the LSM-tree based key-value stores suffer from high WAL synchronization overhead. Fortunately, the advent of NVM offers an opportunity to address this issue. However, NVM is currently too expensive to meet the demand of massive storage systems. Therefore, the hybrid NVM and SSD storage system provides a more cost-efficient solution. This paper proposes HiLSM, a key-value store for hybrid NVM-SSD storage systems. According to the characteristics of hybrid storage mediums, HiLSM adopts hybrid data structures consisting of the log-structured memory and the LSM-tree. Aiming at the issue of write stalls in write intensive scenario, a fine-grained data migration strategy is proposed to make the data migration start as early as possible. Aiming at the performance gap between NVM and SSD, a multi-threaded data migration strategy is proposed to make the data migration complete as soon as possible. Aiming at the LSM-tree's inherent issue of write amplification, a data filtering strategy is proposed to make data updates be absorbed in NVM as much as possible. We compare HiLSM with the state-of-the-art key-value stores via extensive experiments and the results show that HiLSM achieves 1.3x higher throughput for write, 10x higher throughput for read and 79% less write traffic under the skewed workload.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122299815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
SoundFactory SoundFactory
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394036
A. Scionti, S. Ciccia, O. Terzo
{"title":"SoundFactory","authors":"A. Scionti, S. Ciccia, O. Terzo","doi":"10.1145/3387902.3394036","DOIUrl":"https://doi.org/10.1145/3387902.3394036","url":null,"abstract":"The proliferation of smart connected devices using digital assistants activated by voice commands (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.) is raising the interest in algorithms to localize and recognize audio sources. Among the others, deep neural networks (DNNs) are seen as a promising approach to accomplish such task. Unlike other approaches, DNNs can categorize received events, thus discriminating between events of interests and not even in presence of noise. Despite their advantages, DNNs require large datasets to be trained. Thus, tools for generating datasets are of great value, being able to accelerate the development of advanced learning models. This paper presents SoundFactory, a framework for simulating the propagation of sound waves (also considering noise, reverberation, reflection, attenuation, and other interfering waves) and the microphone array response to such sound waves. As such, SoundFactory allows to easily generate datasets to train deep neural networks which are at the basis of modern applications. SoundFactory is flexible enough to simulate many different microphone array configurations, thus covering a large set of use cases. To demonstrate the capabilities offered by SoundFactory, we generated a dataset and trained two different (rather simple) learning models against them, achieving up to 97% of accuracy. The quality of the generated dataset has been also assessed comparing the microphone array model responses with the real ones.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129707179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient architecture design for the AES-128 algorithm on embedded systems 嵌入式系统中AES-128算法的高效架构设计
Proceedings of the 17th ACM International Conference on Computing Frontiers Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392624
Rupam Mondal, H. Ngo, James Shey, R. Rakvic, Owens Walker, Dane Brown
{"title":"Efficient architecture design for the AES-128 algorithm on embedded systems","authors":"Rupam Mondal, H. Ngo, James Shey, R. Rakvic, Owens Walker, Dane Brown","doi":"10.1145/3387902.3392624","DOIUrl":"https://doi.org/10.1145/3387902.3392624","url":null,"abstract":"Many applications make use of the edge devices in wireless sensor networks (WSNs), including video surveillance, traffic monitoring and enforcement, personal and health care, gaming, habitat monitoring, and industrial process control. However, these edge devices are resource-limited embedded systems that require a low-cost, low-power, and high-performance encryption/decryption solution to prevent attacks such as eavesdropping, message modification, and impersonation. This paper proposes a field-programmable gate array (FPGA) based design and implementation of the Advanced Encryption Standard (AES) algorithm for encryption and decryption using a parallel-pipeline architecture with a data forwarding mechanism that efficiently utilizes on-chip memory modules and massive parallel processing units to support a high throughput rate. Hardware designs that optimize the implementation of the AES algorithm are proposed to minimize resource allocation and maximize throughput. These designs are shown to outperform existing solutions in the literature. Additionally, a rapid prototype of a complete system-on-chip (SoC) solution that employs the proposed design on a configurable platform has been developed and proven to be suitable for real-time applications.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132054607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信