Proceedings of the ACM International Conference on Computing Frontiers最新文献_第2页

Towards low-power embedded vector processor 迈向低功耗嵌入式矢量处理器

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903485

Milan Stanic, Oscar Palomar, Timothy Hayes, Ivan Ratković, O. Unsal, A. Cristal

引用次数: 2

Applications of supervised learning techniques on undergraduate admissions data 监督学习技术在本科招生数据中的应用

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911717

T. Lux, Randall Pittman, Maya Shende, Anil M. Shende

{"title":"Applications of supervised learning techniques on undergraduate admissions data","authors":"T. Lux, Randall Pittman, Maya Shende, Anil M. Shende","doi":"10.1145/2903150.2911717","DOIUrl":"https://doi.org/10.1145/2903150.2911717","url":null,"abstract":"In making undergraduate admissions decisions, colleges and universities must take a large amount of data into consideration for each applicant. Surprisingly, there is almost no work reported in the literature for a systematic, automated use of the wealth of data gathered by an institution over the years; such a system could guide admissions offices in targeting applicants so that their yield (the applicants who enroll) is maximized by effectively distributing resources (counselors' time and energy) across applicants. We discuss the use of supervised learning techniques, namely perceptrons and support vector machines, in predicting admission decisions and enrollment based on historical applicant data. We show through experimental results that a classifier, trained and validated on previous years' data, can identify with reasonable accuracy (1) those applicants that the admissions office is likely to accept (based on historical decisions made by the admissions office), and (2) of the accepted applicants, those ones that are likely to enroll at the institution. Additionally, the results from our feature selection experiments can inform admissions offices of the significance of applicant features relative to acceptance and enrollment, thus aiding the office in future data collection and decision making.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131113784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Accelerating graph applications on integrated GPU platforms via instrumentation-driven optimizations 通过仪器驱动的优化加速集成GPU平台上的图形应用程序

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903152

N. Farooqui, Indrajit Roy, Yuan Chen, V. Talwar, K. Schwan

{"title":"Accelerating graph applications on integrated GPU platforms via instrumentation-driven optimizations","authors":"N. Farooqui, Indrajit Roy, Yuan Chen, V. Talwar, K. Schwan","doi":"10.1145/2903150.2903152","DOIUrl":"https://doi.org/10.1145/2903150.2903152","url":null,"abstract":"Integrated GPU platforms are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potential for fine-grained resource scheduling, there remain several open challenges. First, substantial application knowledge is required to leverage GPU acceleration capabilities. Second, static application profiling is inadequate for extracting performance from graph applications that exhibit input-dependent, irregular runtime behaviors. Third, naive scheduling of applications on both CPU and GPU devices may degrade performance due to memory contention. We describe Luminar, a runtime, profile-guided approach to accelerating applications on integrated GPU platforms. By using efficient dynamic instrumentation, Luminar informs resource scheduling about current workload properties. Luminar engenders up to 40% improvements for irregular, graph-based applications, plus 21-80% improvements in throughput and from 3-60% improvements in energy efficiency when scheduling a mix of applications.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131996611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Automated instantiation of side-channel attacks countermeasures for software cipher implementations 软件密码实现侧信道攻击对策的自动实例化

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2911707

G. Agosta, Alessandro Barenghi, Gerardo Pelosi

引用次数: 1

On the design of scalable and reusable accelerators for big data applications 大数据应用中可扩展和可重用加速器的设计

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2906141

C. Pilato, Qirui Xu, Paolo Mantovani, G. D. Guglielmo, L. Carloni

引用次数: 8

Application characterization at scale: lessons learned from developing a distributed open community runtime system for high performance computing 大规模应用程序特性描述:从开发用于高性能计算的分布式开放社区运行时系统中获得的经验教训

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903166

Joshua Landwehr, Joshua D. Suetterlein, A. Márquez, J. Manzano, G. Gao

{"title":"Application characterization at scale: lessons learned from developing a distributed open community runtime system for high performance computing","authors":"Joshua Landwehr, Joshua D. Suetterlein, A. Márquez, J. Manzano, G. Gao","doi":"10.1145/2903150.2903166","DOIUrl":"https://doi.org/10.1145/2903150.2903166","url":null,"abstract":"Since 2012, the U.S. Department of Energy's X-Stack program has been developing solutions including runtime systems, programming models, languages, compilers, and tools for the Exascale system software to address crucial performance and power requirements. Fine grain programming models and runtime systems show a great potential to efficiently utilize the underlying hardware. Thus, they are essential to many X-Stack efforts. An abundant amount of small tasks can better utilize the vast parallelism available on current and future machines. Moreover, finer tasks can recover faster and adapt better, due to a decrease in state and control. Nevertheless, current applications have been written to exploit old paradigms (such as Communicating Sequential Processor and Bulk Synchronous Parallel processing). To fully utilize the advantages of these new systems, applications need to be adapted to these new paradigms. As part of the applications' porting process, in-depth characterization studies, focused on both application characteristics and runtime features, need to take place to fully understand the application performance bottlenecks and how to resolve them. This paper presents a characterization study for a novel high performance runtime system, called the Open Community Runtime, using key HPC kernels as its vehicle. This study has the following contributions: one of the first high performance, fine grain, distributed memory runtime system implementing the OCR standard (version 0.99a); and a characterization study of key HPC kernels in terms of runtime primitives running on both intra and inter node environments. Running on a general purpose cluster, we have found up to 1635x relative speed-up for a parallel tiled Cholesky Kernels on 128 nodes with 16 cores each and a 1864x relative speed-up for a parallel tiled Smith-Waterman kernel on 128 nodes with 30 cores.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116872933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Resolving frontier problems of mastering large-scale supercomputer complexes 解决掌握大规模超级计算机综合体的前沿问题

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903481

D. Nikitenko, V. Voevodin, S. Zhumatiy

{"title":"Resolving frontier problems of mastering large-scale supercomputer complexes","authors":"D. Nikitenko, V. Voevodin, S. Zhumatiy","doi":"10.1145/2903150.2903481","DOIUrl":"https://doi.org/10.1145/2903150.2903481","url":null,"abstract":"Managing and administering of large-scale HPC centers is a complicated problem. Using a number of independent tools for resolving its seemingly independent sub problems can become a bottleneck with rapidly increasing scale of systems, number of hardware and software components, variety of user applications and types of licenses, number of users and workgroups, and so on. The developed tool is designed to help resolving routine problems in mastering and administering of any supercomputer center from a scale of a stand-alone system up to the top-rank supercomputer centers that include a number of totally different HPC systems. The toolkit implements a flexibly configurable variety of essential tools in a single interface. It also features useful means of automation for typical administering and management multi-step procedures. Another important design and implementation feature allows installing and using the toolkit without any significant changes to existing administrating tools and system software. The developed tool is not integrated with target machines system software, it is run on a remote server and runs scripts on HPC systems via SSH as a dedicated user with limited access permissions to perform certain actions. This reduces possibility of security issues greatly and takes care of many fault tolerance issues that are in the line of the key challenges on the road to the Exascale. At the same time this allows administrator performing any operations with corresponding to the situation tools, whether using our tools or any other available tool. The approbation of the developed system proved its practicality in HPC center with some Petaflop-level supercomputers, thousands of active researchers from a diversity of institutions within several hundreds of applied projects.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123225408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Scalable 2D K-SVD parallel algorithm for dictionary learning on GPUs 基于gpu的可扩展二维K-SVD并行字典学习算法

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903176

Lu He, Timothy Miskell, R. Liu, Hengyong Yu, Huijuan Xu, Yan Luo

{"title":"Scalable 2D K-SVD parallel algorithm for dictionary learning on GPUs","authors":"Lu He, Timothy Miskell, R. Liu, Hengyong Yu, Huijuan Xu, Yan Luo","doi":"10.1145/2903150.2903176","DOIUrl":"https://doi.org/10.1145/2903150.2903176","url":null,"abstract":"In recent years, the K-SVD algorithm for dictionary learning has been widely used in the field of image processing. The learning algorithm constructs a dictionary consisting of groups of signal atoms derived from a set of images. The sparse linear combination of the signal atoms are used to construct the best possible match based upon the original images. The myriad applications of K-SVD algorithm include reconstruction, compression, denoising, sparse coding, super resolution, and feature extraction. The K-SVD algorithm is a serial machine learning algorithm whereby each of the signal atoms are trained in succession. All of the signal atoms are updated once within any given iteration. Given that the algorithmic complexity for one iteration is O(n4), the training phase of the K-SVD algorithm is time-intensive. In order to increase the speed the K-SVD algorithm and reduce the run-time execution of each iteration, the following paper proposes a parallel version of the K-SVD algorithm and verifies its validity. We design and optimize the parallel algorithm on an Nvidia Titan X GPU by employing three strategies, specifically batches, early stop, and streaming. Experimental results indicate that the parallel algorithm produces a pronounced speedup of 80x when compared to multi-thread MATLAB implementation of the K-SVD algorithm running on a quad-core CPU.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123842379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Using colored petri nets for GPGPU performance modeling 使用彩色petri网进行GPGPU性能建模

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903167

S. Madougou, A. Varbanescu, C. D. Laat

{"title":"Using colored petri nets for GPGPU performance modeling","authors":"S. Madougou, A. Varbanescu, C. D. Laat","doi":"10.1145/2903150.2903167","DOIUrl":"https://doi.org/10.1145/2903150.2903167","url":null,"abstract":"Performance analysis and modeling of applications running on GPUs is still a challenge for most designers and developers. State-of-the-art solutions are dominated by two classic approaches: statistical models that require a lot of training and profiling on existing hardware, and analytical models that require in-depth knowledge of the hardware platform and significant calibration. Both these classes separate the application from the hardware and attempt a high-level combination of the two models for performance prediction. In this work, we propose an orthogonal approach, based on high-level simulation. Specifically, we use Colored Petri Nets (CPN) to model both the hardware and the application. Using this model, the execution of the application is a simulation of the CPN model using warps as tokens. Our prototype implementation of this modeling approach demonstrates promising results on a few case studies on two different GPU architectures: both reasonably accurate predictions and detailed execution information are obtained. We conclude that CPN-based GPU performance modeling is an elegant solution for systematic performance prediction, and we focus further on optimizing the models to improve the execution time of the symbolic simulation.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114509186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Conserving cooling and computing power by distributing workloads in data centers 通过在数据中心分配工作负载来节省冷却和计算能力

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI: 10.1145/2903150.2903177

Ruihong Lin, Yuhui Deng, Liyao Yang

{"title":"Conserving cooling and computing power by distributing workloads in data centers","authors":"Ruihong Lin, Yuhui Deng, Liyao Yang","doi":"10.1145/2903150.2903177","DOIUrl":"https://doi.org/10.1145/2903150.2903177","url":null,"abstract":"Reducing the power consumption has become one of the most important challenges in designing modern data centers due to the explosive growth of data. The traditional approaches employed to decrease the power consumption normally do not consider the power of IT devices and the power of cooling system simultaneously. In contrast to existing works, this paper proposes a power model which can minimize the overall power consumption of data centers by balancing the computing power and cooling power. Furthermore, an Enhanced Genetic Algorithm (EGA) is designed to explore the solution space of the power model since the model is a linear programming problem. However, EGA is computing intensive and the performance gradually decreases with the growth of the problem size. Therefore, Heuristic Greedy Sequence (HGS) is proposed to simplify the calculation by leveraging the nature of greed. In contrast to EGA, HGS can determine the workload allocation of a specific data center layout with only one calculation. Experimental results demonstrate that both the EGA and HGS can significantly reduce the power consumption of data centers in contrast to the random algorithm. Additionally, HGS significantly outperforms that of EGA.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128001831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2