Proceedings of the ACM International Conference on Computing Frontiers最新文献

筛选
英文 中文
Exploring embedded systems virtualization using MIPS virtualization module 使用MIPS虚拟化模块探索嵌入式系统虚拟化
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903179
C. Moratelli, S. J. Filho, Fabiano Hessel
{"title":"Exploring embedded systems virtualization using MIPS virtualization module","authors":"C. Moratelli, S. J. Filho, Fabiano Hessel","doi":"10.1145/2903150.2903179","DOIUrl":"https://doi.org/10.1145/2903150.2903179","url":null,"abstract":"Embedded virtualization has emerged as a valuable way to increase security, reduce costs, improve software quality and decrease design time. The late adoption of hardware-assisted virtualization in embedded processors induced the development of hypervisors primarily based on para-virtualization. Recently, embedded processor designers developed virtualization extensions for their processor architectures similar to those adopted in cloud computing years ago. Now, the hypervisors are migrating to a mixed approach, where basic operating system functionalities take advantage of full-virtualization and advanced functionalities such as inter-domain communication remain para-virtualized. In this paper, we discuss the key features for embedded virtualization. We show how our embedded hypervisor was designed to support these features, taking advantage of the hardware-assisted virtualization available to the MIPS family of processors. Different aspects of our hypervisor are evaluated and compared to other similar approaches. A hardware platform was used to run benchmarks on virtualized instances of both Linux and a RTOS for performance analysis. Finally, the results obtained show that our hypervisor can be applied as a sound solution for the IoT.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121034927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
First impressions from detailed brain model simulations on a Xeon/Xeon-Phi node Xeon/Xeon- phi节点上详细的大脑模型模拟的第一印象
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903477
G. Chatzikonstantis, D. Rodopoulos, Sofia Nomikou, C. Strydis, C. I. Zeeuw, D. Soudris
{"title":"First impressions from detailed brain model simulations on a Xeon/Xeon-Phi node","authors":"G. Chatzikonstantis, D. Rodopoulos, Sofia Nomikou, C. Strydis, C. I. Zeeuw, D. Soudris","doi":"10.1145/2903150.2903477","DOIUrl":"https://doi.org/10.1145/2903150.2903477","url":null,"abstract":"The development of physiologically plausible neuron models comes with increased complexity, which poses a challenge for many-core computing. In this work, we have chosen an extension of the demanding Hodgkin-Huxley model for the neurons of the Inferior Olivary Nucleus, an area of vital importance for motor skills. The computing fabric of choice is an Intel Xeon-Xeon Phi system, widely-used in modern computing infrastructure. The target application is parallelized with combinations of MPI and OpenMP. The best configurations are scaled up to human InfOli numbers.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115732474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Sub-PicoJoule per operation scalable computing: why, when, how? 每操作次皮焦耳可扩展计算:为什么,何时,如何?
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2916035
L. Benini
{"title":"Sub-PicoJoule per operation scalable computing: why, when, how?","authors":"L. Benini","doi":"10.1145/2903150.2916035","DOIUrl":"https://doi.org/10.1145/2903150.2916035","url":null,"abstract":"The \"internet of everything\" envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this vision. Our recent results with the open-source PULP (parallel ultra-low power) chips demonstrate that pj/OP (GOPS/mW) computational efficiency is within reach in today's 28nm CMOS FDSOI technology. In this talk, I will look at the next 1000x of energy efficiency improvement, which will require heterogeneous 3D integration, mixed-signal, approximate processing and non-Von-Neumann architectures for scalable acceleration.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125997299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Techniques for modulating error resilience in emerging multi-value technologies 新兴多值技术中的误差弹性调制技术
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903154
Magnus Själander, Gustaf Borgström, M. Klymenko, F. Remacle, S. Kaxiras
{"title":"Techniques for modulating error resilience in emerging multi-value technologies","authors":"Magnus Själander, Gustaf Borgström, M. Klymenko, F. Remacle, S. Kaxiras","doi":"10.1145/2903150.2903154","DOIUrl":"https://doi.org/10.1145/2903150.2903154","url":null,"abstract":"There exist extensive ongoing research efforts on emerging atomic scale technologies that have the potential to become an alternative to today's CMOS technologies. A common feature among the investigated technologies is that of multi-value devices, in particular, the possibility of implementing quaternary logic and memory. However, multi-value devices tend to be more sensitive to interferences and, thus, have reduced error resilience. We present an architecture based on multi-value devices where we can trade energy efficiency against error resilience. Important data are encoded in a more robust binary format while error tolerant data is encoded in a quaternary format. We show for eight benchmarks an average energy reduction of 14%, 20%, and 32% for the register file, level-one data cache, and main memory, respectively, and for three integer benchmarks, an energy reduction for arithmetic operations of up to 28%. We also show that for a quaternary technology to be viable a raw bit error rate of one error in 100 million or better is required.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131453738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Scalable betweenness centrality on multi-GPU systems 多gpu系统上的可伸缩中间性中心
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903153
M. Bernaschi, Giancarlo Carbone, Flavio Vella
{"title":"Scalable betweenness centrality on multi-GPU systems","authors":"M. Bernaschi, Giancarlo Carbone, Flavio Vella","doi":"10.1145/2903150.2903153","DOIUrl":"https://doi.org/10.1145/2903150.2903153","url":null,"abstract":"Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the influence of a vertex in a graph. The BC score of a vertex is proportional to the number of all-pairs-shortest-paths passing through it. However, complete and exact BC computation for a large-scale graph is an extraordinary challenge that requires high performance computing techniques to provide results in a reasonable amount of time. Our approach combines bi-dimensional (2-D) decomposition of the graph and multi-level parallelism together with a suitable data-thread mapping that overcomes most of the difficulties caused by the irregularity of the computation on GPUs. In order to reduce time and space requirements of BC computation, a heuristics based on 1-degree reduction technique is developed as well. Experimental results on synthetic and real-world graphs show that the proposed techniques are well suited to compute BC scores in graphs which are too large to fit in the memory of a single computational node.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130527657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
vSIP: virtual scheduler for interactive performance vSIP:用于交互性能的虚拟调度程序
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903178
Yan Sui, Chun Yang, Ning Jia, Xu Cheng
{"title":"vSIP: virtual scheduler for interactive performance","authors":"Yan Sui, Chun Yang, Ning Jia, Xu Cheng","doi":"10.1145/2903150.2903178","DOIUrl":"https://doi.org/10.1145/2903150.2903178","url":null,"abstract":"This paper presents vSIP, a new scheme of virtual desktop disk scheduling on sharing storage system for user-interactive performance. The proposed scheme enables requests to be dynamically prioritized based on the interactive feature of applications sending them. To enhance user experience on consolidated desktops, our scheme provides interactive applications with priority requests, which have less latency in accessing storage than requests from non-interactive applications sharing the same storage. To this end, we devise a hypervisor extension that classifies interactive applications from non-interactive applications. Our framework prioritizes the requests from these applications and limits the requests rate. Our evaluation shows that the proposed scheme significantly improves interactive performance of storage-sensitive application such as applications launch, Web page loading and video cold playback, when other storage-intensive applications highly disturb the interactive applications. In addition, we introduce a guest OS information transfer method, hence the efficiency and accuracy of the identification of interactive applications can be further improved.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133460962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era 从FLOPS到BYTES:后摩尔时代高性能计算的颠覆性变化
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2906830
S. Matsuoka, H. Amano, K. Nakajima, Koji Inoue, T. Kudoh, N. Maruyama, K. Taura, Takeshi Iwashita, T. Katagiri, T. Hanawa, Toshio Endo
{"title":"From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era","authors":"S. Matsuoka, H. Amano, K. Nakajima, Koji Inoue, T. Kudoh, N. Maruyama, K. Taura, Takeshi Iwashita, T. Katagiri, T. Hanawa, Toshio Endo","doi":"10.1145/2903150.2906830","DOIUrl":"https://doi.org/10.1145/2903150.2906830","url":null,"abstract":"Slowdown and inevitable end in exponential scaling of processor performance, the end of the so-called \"Moore's Law\" is predicted to occur around 2025--2030 timeframe. Because CMOS semiconductor voltage is also approaching its limits, this means that logic transistor power will become constant, and as a result, the system FLOPS will cease to improve, resulting in serious consequences for IT in general, especially supercomputing. Existing attempts to overcome the end of Moore's law are rather limited in their future outlook or applicability. We claim that data-oriented parameters, such as bandwidth and capacity, or BYTES, are the new parameters that will allow continued performance gains for periods even after computing performance or FLOPS ceases to improve, due to continued advances in storage device technologies and optics, and manufacturing technologies including 3-D packaging. Such transition from FLOPS to BYTES will lead to disruptive changes in the overall systems from applications, algorithms, software to architecture, as to what parameter to optimize for, in order to achieve continued performance growth over time. We are launching a new set of research efforts to investigate and devise new technologies to enable such disruptive changes from FLOPS to BYTES in the Post-Moore era, focusing on HPC, where there is extreme sensitivity to performance, and expect the results to disseminate to the rest of IT.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133108224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Exploring dataflow-based thread level parallelism in cyber-physical systems 探索网络物理系统中基于数据流的线程级并行性
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2906829
R. Giorgi
{"title":"Exploring dataflow-based thread level parallelism in cyber-physical systems","authors":"R. Giorgi","doi":"10.1145/2903150.2906829","DOIUrl":"https://doi.org/10.1145/2903150.2906829","url":null,"abstract":"Smart Cyber-Physical Systems (SCPS) aim not only at integrating computational platforms and physical processes, but also at creating larger \"systems of systems\" capable of satisfying multiple critical constraints such as energy efficiency, high-performance, safety, security, size and cost. The AXIOM project aims at designing such systems by focusing on low-cost Single Board Computers (SBC), based on current System-on-Chips (SoC) that include both programmable logic (FPGA), multi-core CPUs, accelerators and peripherals. A dataflow execution model, partially developed in the TERAFLUX project, brings a more predictable and reliable execution. The goals of AXIOM include: i) the possibility to easily program the system with a shared-memory model based on OmpSs; ii) the possibility of scaling up the system through a custom but inexpensive interconnect; iii) the possibility of accelerating a specific function on a single or multiple FPGAs of the system. The dataflow execution model operates at thread-level granularity. In this paper the AXIOM execution model and the related memory memory model is further detailed. The memory model is key for the execution of threads while reducing the need of data transfers. The preliminary results confirm the scalability of this model.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128575255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA 遏制屋顶线:一个可扩展和灵活的FPGA cnn架构
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911715
P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini
{"title":"Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA","authors":"P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini","doi":"10.1145/2903150.2911715","DOIUrl":"https://doi.org/10.1145/2903150.2911715","url":null,"abstract":"Convolutional Neural Networks (CNNs) have reached outstanding results in several complex visual recognition tasks, such as classification and scene parsing. CNNs are composed of multiple filtering layers that perform 2D convolutions over input images. The intrinsic parallelism in such a computation kernel makes it suitable to be effectively accelerated on parallel hardware. In this paper we propose a highly flexible and scalable architectural template for acceleration of CNNs on FPGA devices, based on the cooperation between a set of software cores and a parallel convolution engine that communicate via a tightly coupled L1 shared scratchpad. Our accelerator structure, tested on a Xilinx Zynq XC-Z7045 device, delivers peak performance up to 80 GMAC/s, corresponding to 100 MMAC/s for each DSP slice in the programmable fabric. Thanks to the flexible architecture, convolution operations can be scheduled in order to reduce input/output bandwidth down to 8 bytes per cycle without degrading the performance of the accelerator in most of the meaningful use-cases.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133408171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Accelerating the mining of influential nodes in complex networks through community detection 通过社区检测加速复杂网络中有影响节点的挖掘
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903181
M. Halappanavar, A. Sathanur, A. Nandi
{"title":"Accelerating the mining of influential nodes in complex networks through community detection","authors":"M. Halappanavar, A. Sathanur, A. Nandi","doi":"10.1145/2903150.2903181","DOIUrl":"https://doi.org/10.1145/2903150.2903181","url":null,"abstract":"Computing the set of influential nodes of a given size, which when activated will ensure maximal spread of influence on a complex network, is a challenging problem impacting multiple applications. A rigorous approach to influence maximization involves utilization of optimization routines that come with a high computational cost. In this work, we propose to exploit the existence of communities in complex networks to accelerate the mining of influential seeds. We provide intuitive reasoning to explain why our approach should be able to provide speedups without significantly degrading the extent of the spread of influence when compared to the case of influence maximization without using the community information. Additionally, we have parallelized the complete workflow by leveraging an existing parallel implementation of the Louvain community detection algorithm. We then conduct a series of experiments on a dataset with three representative graphs to first verify our implementation and then demonstrate the speedups. Our method achieves speedups ranging from 3x to 28x for graphs with small number of communities while nearly matching or even exceeding the activation performance on the entire graph. Complexity analysis reveals that dramatic speedups are possible for larger graphs that contain a correspondingly larger number of communities. In addition to the speedups obtained from the utilization of the community structure, scalability results show up to 6.3x speedup on 20 cores relative to the baseline run on 2 cores. Finally, current limitations of the approach are outlined along with the planned next steps.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129815713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信