Proceedings of the ACM International Conference on Computing Frontiers最新文献

筛选
英文 中文
An architecture for near-data processing systems 近数据处理系统的体系结构
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903478
E. Vermij, C. Hagleitner, Leandro Fiorin, R. Jongerius, J. V. Lunteren, K. Bertels
{"title":"An architecture for near-data processing systems","authors":"E. Vermij, C. Hagleitner, Leandro Fiorin, R. Jongerius, J. V. Lunteren, K. Bertels","doi":"10.1145/2903150.2903478","DOIUrl":"https://doi.org/10.1145/2903150.2903478","url":null,"abstract":"Near-data processing is a promising paradigm to address the bandwidth, latency, and energy limitations in today's computer systems. In this work, we introduce an architecture that enhances a contemporary multi-core CPU with new features for supporting a seamless integration of near-data processing capabilities. Crucial aspects such as coherency, data placement, communication, address translation, and the programming model are discussed. The essential components, as well as a system simulator, are realized in hardware and software. Results for the important Graph500 benchmark show a 1.5x speedup when using the proposed architecture.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126115154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Heterogeneous chip multiprocessor architectures for big data applications 面向大数据应用的异构芯片多处理器架构
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2908078
H. Homayoun
{"title":"Heterogeneous chip multiprocessor architectures for big data applications","authors":"H. Homayoun","doi":"10.1145/2903150.2908078","DOIUrl":"https://doi.org/10.1145/2903150.2908078","url":null,"abstract":"Emerging big data analytics applications require a significant amount of server computational power. The costs of building and running a computing server to process big data and the capacity to which we can scale it are driven in large part by those computational resources. However, big data applications share many characteristics that are fundamentally different from traditional desktop, parallel, and scale-out applications. Big data analytics applications rely heavily on specific deep machine learning and data mining algorithms, and are running a complex and deep software stack with various components (e.g. Hadoop, Spark, MPI, Hbase, Impala, MySQL, Hive, Shark, Apache, and MangoDB) that are bound together with a runtime software system and interact significantly with I/O and OS, exhibiting high computational intensity, memory intensity, I/O intensity and control intensity. Current server designs, based on commodity homogeneous processors, will not be the most efficient in terms of performance/watt for this emerging class of applications. In other domains, heterogeneous architectures have emerged as a promising solution to enhance energy-efficiency by allowing each application to run on a core that matches resource needs more closely than a one-size-fits-all core. A heterogeneous architecture integrates cores with various micro-architectures and accelerators to provide more opportunity for efficient workload mapping. In this work, through methodical investigation of power and performance measurements, and comprehensive system level characterization, we demonstrate that a heterogeneous architecture combining high performance big and low power little cores is required for efficient big data analytics applications processing, and in particular in the presence of accelerators and near real-time performance constraints.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124757179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Does it sound as it claims: a detailed side-channel security analysis of QuadSeal countermeasure 它听起来像它声称的那样:对QuadSeal反措施的详细侧信道安全分析
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911709
Darshana Jayasinghe, S. Bhasin, S. Parameswaran, A. Ignjatović
{"title":"Does it sound as it claims: a detailed side-channel security analysis of QuadSeal countermeasure","authors":"Darshana Jayasinghe, S. Bhasin, S. Parameswaran, A. Ignjatović","doi":"10.1145/2903150.2911709","DOIUrl":"https://doi.org/10.1145/2903150.2911709","url":null,"abstract":"VLSI systems often rely on embedded cryptographic cores for security when the confidentiality and authorization is a must. Such cores are theoretically sound but often vulnerable to physical attacks like side-channel analysis (SCA). Several countermeasures have been previously proposed to protect these cryptographic cores. QuadSeal was proposed as an algorithmic balancing technique to thwart power analysis attacks on block cipher algorithms. QuadSeal can be implemented either in hardware or software and it was previously shown on Advanced Encryption Standard (AES) (referred as QuadSeal-AES) to be resistant against power analysis attacks (Correlation Power Analsis and Mutual Information Analysis). In this paper, we analyze QuadSeal against SCA (against power analysis attacks) using leakage detection techniques as well as Correlation Power Analysis with success rates. Our results show that QuadSeal has leakages; however CPA with success rate attack was unable to exploit the leakages efficiently.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116220450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Area-energy tradeoffs of logic wear-leveling for BTI-induced aging bti诱导老化逻辑磨损均衡的面积-能量权衡
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903171
R. Ashraf, N. Khoshavi, Ahmad Alzahrani, R. Demara, S. Kiamehr, M. Tahoori
{"title":"Area-energy tradeoffs of logic wear-leveling for BTI-induced aging","authors":"R. Ashraf, N. Khoshavi, Ahmad Alzahrani, R. Demara, S. Kiamehr, M. Tahoori","doi":"10.1145/2903150.2903171","DOIUrl":"https://doi.org/10.1145/2903150.2903171","url":null,"abstract":"Ensuring operational reliability in the presence of Bias Temperature Instability (BTI) effects often results in a compromise either in the form of lower performance and/or higher energy-consumption. This is due to the performance degradation over time caused by BTI effects which needs to be compensated through frequency, voltage, or area margining to meet the circuit's timing specification till end of operational lifetime. In this paper, a circuit-level approach referred to as Logic-Wear-Leveling (LWL) utilizes Dark-Silicon to mitigate BTI effects in logic datapaths. LWL introduces fine-grained spatial redundancy in timing vulnerable logic components, and leverages it at runtime to enable post-Silicon adaptability. The activation interval of redundant datapaths allows for controlled stress and recovery phases. This produces a wear-leveling effect which helps to reduce the BTI induced performance degradation over time, which in turn helps to reduce the design margins. This approach demonstrates a significant reduction in energy consumption of up to 31.98% at 10 years as compared to conventional voltage guardbanding approach. The benefit of energy reduction is also assessed against the area overheads of spatial redundancy.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121960476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
P-Socket: optimizing a communication library for a PCIe-based intra-rack interconnect P-Socket:优化基于pcie的机架内互连的通信库
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903168
Liuhang Zhang, Rui Hou, S. Mckee, Jianbo Dong, Lixin Zhang
{"title":"P-Socket: optimizing a communication library for a PCIe-based intra-rack interconnect","authors":"Liuhang Zhang, Rui Hou, S. Mckee, Jianbo Dong, Lixin Zhang","doi":"10.1145/2903150.2903168","DOIUrl":"https://doi.org/10.1145/2903150.2903168","url":null,"abstract":"Data centers require efficient, low-cost, flexible interconnects to manage the rapidly growing internal traffic generated by an increasingly diverse set of applications. To meet these requirements, data center networks are increasingly employing alternatives such as RapidIO, Freedom, and PCIe, which require fewer physical devices and/or have simpler protocols than more traditional interconnects. These networks offer raw high performance communication capabilities, but simply using them for conventional TCP/IP-based communication fails to realize the potential performance of the physical network. Here we analyze causes for this performance loss for the TCP/IP protocol over one such fabric, PCIe, and we explore a hardware/software solution that mitigates overheads and exploits PCIe's advanced features. The result is P-Socket, an efficient library that enables legacy socket applications to run without modification. Our experiments show that P-Socket achieves an end-to-end latency of 1.2μs and effective bandwidth of up to 2.87GB/s (out of a theoretical peak of 3.05GB/s).","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131242585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table 提高基于目录的缓存一致性协议在子页面粒度上的一致性绕过和一个新的片上页表的性能
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903175
M. Soltaniyeh, I. Kadayif, Özcan Özturk
{"title":"Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table","authors":"M. Soltaniyeh, I. Kadayif, Özcan Özturk","doi":"10.1145/2903150.2903175","DOIUrl":"https://doi.org/10.1145/2903150.2903175","url":null,"abstract":"Chip multiprocessors (CMPs) require effective cache coherence protocols as well as fast virtual-to-physical address translation mechanisms for high performance. Directory-based cache coherence protocols are the state-of-the-art approaches in many-core CMPs to keep the data blocks coherent at the last level private caches. However, the area overhead and high associativity requirement of the directory structures may not scale well with increasingly higher number of cores. As shown in some prior studies, a significant percentage of data blocks are accessed by only one core, therefore, it is not necessary to keep track of these in the directory structure. In this study, we have two major contributions. First, we show that compared to the classification of cache blocks at page granularity as done in some previous studies, data block classification at subpage level helps to detect considerably more private data blocks. Consequently, it reduces the percentage of blocks required to be tracked in the directory significantly compared to similar page level classification approaches. This, in turn, enables smaller directory caches with lower associativity to be used in CMPs without hurting performance, thereby helping the directory structure to scale gracefully with the increasing number of cores. Memory block classification at subpage level, however, may increase the frequency of the Operating System's (OS) involvement in updating the maintenance bits belonging to subpages stored in page table entries, nullifying some portion of performance benefits of subpage level data classification. To overcome this, we propose a distributed on-chip page table as a our second contribution.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114842905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prototyping real-time tracking systems on mobile devices 移动设备上的实时跟踪系统原型
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903471
Kyunghun Lee, Haifa Ben Salem, T. Damarla, W. Stechele, S. Bhattacharyya
{"title":"Prototyping real-time tracking systems on mobile devices","authors":"Kyunghun Lee, Haifa Ben Salem, T. Damarla, W. Stechele, S. Bhattacharyya","doi":"10.1145/2903150.2903471","DOIUrl":"https://doi.org/10.1145/2903150.2903471","url":null,"abstract":"In this paper, we address the design an implementation of low power embedded systems for real-time tracking of humans and vehicles. Such systems are important in applications such as activity monitoring and border security. We motivate the utility of mobile devices in prototyping the targeted class of tracking systems, and demonstrate a dataflow-based and cross-platform design methodology that enables efficient experimentation with key aspects of our tracking system design, including real-time operation, experimentation with advanced sensors, and streamlined management of design versions on host and mobile platforms. Our experiments demonstrate the utility of our mobile-device-targeted design methodology in validating tracking algorithm operation; evaluating real-time performance, energy efficiency, and accuracy of tracking system execution; and quantifying trade-offs involving use of advanced sensors, which offer improved sensing accuracy at the expense of increased cost and weight. Additionally, through application of a novel, cross-platform, model-based design approach, our design requires no change in source code when migrating from an initial, host-computer-based functional reference to a fully-functional implementation on the targeted mobile device.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114901366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A lightweight user tracking method for app providers 应用程序提供商的轻量级用户跟踪方法
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903484
R. M. Frey, Runhua Xu, A. Ilic
{"title":"A lightweight user tracking method for app providers","authors":"R. M. Frey, Runhua Xu, A. Ilic","doi":"10.1145/2903150.2903484","DOIUrl":"https://doi.org/10.1145/2903150.2903484","url":null,"abstract":"Since 2013, Google and Apple no longer allow app providers to use the persistent device identifiers (Android ID and UDID) for user tracking on mobile devices. Other tracking options provoke either severe privacy concerns, need additional hardware or are only practicable by a limited number of companies. In this paper, we present a lightweight method that overcomes these weaknesses by using the set of installed apps on a device to create a unique fingerprint. The method was evaluated in a field study with 2410 users and 175,658 installed apps in total. The sets of these installed apps are unique in 99.75% of all inspected users. Furthermore, by reducing the granularity from apps to app categories to lessen users' privacy concerns, the results remain highly unique with an identification rate of 96.22%. Since the information of installed apps and app categories on each device is freely available for any app developer, the method is a valuable instrument for app providers.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123477264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters IVM:基于任务的共享内存编程模型和运行时系统,支持对CPU-GPU集群的统一访问
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903174
Kittisak Sajjapongse, Ruidong Gu, M. Becchi
{"title":"IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters","authors":"Kittisak Sajjapongse, Ruidong Gu, M. Becchi","doi":"10.1145/2903150.2903174","DOIUrl":"https://doi.org/10.1145/2903150.2903174","url":null,"abstract":"GPUs have been widely used to accelerate a variety of applications from different domains and have become part of high-performance computing clusters. Yet, the use of GPUs within distributed applications still faces significant challenges in terms of programmability and performance portability. The use of popular programming models for distributed applications (such as MPI, SHMEM, and Charm++) in combination with GPU programming frameworks (such as CUDA and OpenCL) exposes to the programmer disjoint memory address spaces and provides a non-uniform view of compute resources (i.e., CPUs and GPUs). In addition, these programming models often perform static assignment of tasks to compute resources and require significant programming effort to embed dynamic scheduling and load balancing mechanisms within the application. In this work, we propose a programming framework called Inter-node Virtual Memory (IVM) that provides the programmer with a uniform view of compute resources and memory spaces within a CPU-GPU cluster, and a mechanism to easily incorporate load balancing within the application. We compare MPI, Charm++ and IVM on four distributed GPU applications. Our experimental results show that, while the main goal of IVM is programmer productivity, the use of the load balancing mechanisms offered by this framework can also lead to performance gains over existing frameworks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117126369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms 基于异构CPU-GPU平台的三维欧拉大气求解器加速
Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903480
Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Guangwen Yang
{"title":"Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms","authors":"Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Guangwen Yang","doi":"10.1145/2903150.2903480","DOIUrl":"https://doi.org/10.1145/2903150.2903480","url":null,"abstract":"In climate change studies, the atmospheric model is an essential component for building a high-resolution climate simulation system. While the accuracy of atmospheric simulations has long been limited by the computational capabilities of CPU platforms, the heterogeneous platforms equipped with accelerators are becoming promising candidates for achieving high simulating performance. However, due to the complex algorithms and the heavy communications, atmospheric developers have to face to the tough challenges from both the algorithmic and architectural aspects. In this paper, we propose a hybrid algorithm to accelerate the solver of Euler atmospheric equations, which are the most essential equation sets to simulate the mesoscale atmospheric dynamics. Based on the heterogeneous CPU-GPU platform, we develop a 3-dimensional domain decomposition mechanism, which can achieve more efficient utilization of the computing resources. Furthermore, an extensive set of optimization techniques is applied to boost the performance of the solver on both the host and accelerator side. Compared with the performance of fully-optimized two 6-core CPU version, the optimized Euler solver can achieve a speedup of 6.64x when running on a hybrid node with two 6-core Intel Xeon E5645 CPUs and one Tesla K20c GPU. In addition, a nearly linear weak scaling result is achieved on a cluster with 12 CPU-GPU nodes. The experimental results demonstrate promising possibility to apply heterogeneous architecture in the study of the atmospheric simulation.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127135562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信