Proceedings of the 18th ACM International Conference on Computing Frontiers最新文献_第3页

An online guided tuning approach to run CNN pipelines on edge devices 一种在边缘设备上运行CNN管道的在线指导调谐方法

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3458662

Pirah Noor Soomro, M. Abduljabbar, J. Castrillón, M. Pericàs

{"title":"An online guided tuning approach to run CNN pipelines on edge devices","authors":"Pirah Noor Soomro, M. Abduljabbar, J. Castrillón, M. Pericàs","doi":"10.1145/3457388.3458662","DOIUrl":"https://doi.org/10.1145/3457388.3458662","url":null,"abstract":"Modern edge and mobile devices are equipped with powerful computing resources. These are often organized as heterogeneous multi-cores, featuring performance-asymmetric core clusters. This raises the question on how to effectively execute the inference pass of convolutional neural networks (CNN) on such devices. Existing CNN implementations on edge devices leverage offline profiling data to determine a better schedule for CNN applications. This approach requires a time consuming phase of generating a performance profile for each type of representative kernel on various core configurations available on the device, coupled with a search space exploration. We propose an online tuning technique which utilizes compile time hints and online profiling data to generate high throughput CNN pipelines. We explore core heterogeneity and compatible core-layer configurations through an online guided search. Unlike exhaustive search, we adopt an evolutionary approach with a guided starting point in order to find the solution. We show that by pruning and navigating through the complex search space using compile time hints, 79% of the tested configurations turn out to be near-optimal candidates for a throughput maximizing pipeline on NVIDIA Jetson TX2 platform.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121362286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Dynamic row activation mechanism for multi-core systems 多核系统的动态行激活机制

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3458660

Tareq A. Alawneh, R. Kirner, C. Menon

引用次数: 3

Parallel graph algorithms by blocks: from I/O to algorithms 并行图算法块:从I/O到算法

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3459987

Abdurrahman Yasar, Kasimir Gabert, Ümit V. Çatalyürek

{"title":"Parallel graph algorithms by blocks: from I/O to algorithms","authors":"Abdurrahman Yasar, Kasimir Gabert, Ümit V. Çatalyürek","doi":"10.1145/3457388.3459987","DOIUrl":"https://doi.org/10.1145/3457388.3459987","url":null,"abstract":"In today's data-driven world and heterogeneous computing environments, processing large-scale graphs in an architecture agnostic manner has become more crucial than ever before. In terms of graph analytics frameworks, on the one side, there has been a significant interest in developing hand-optimized high-performance computing solutions. On the systems side, following the big data movement and to bring parallel computing to the masses, researchers have proposed several graph processing and management systems to handle large-scale graphs. Hand optimized HPC approaches require high expertise and are expensive to maintain and develop, and graph processing frameworks suffer from limited expressibility and performance. We propose Parallel Graph Algorithms by Blocks (PGAbB), a block-based graph algorithms framework for shared-memory, multi-core, multi-GPU machines. PGAbB offers a sweet spot between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems while performing close to hand-optimized HPC implementations. While our PGAbB framework, as well as many other recent HPC graph-analytics frameworks, are highly tuned and able to run complex graph analytics in fractions of seconds on billion-edge graphs, there remains a gap in their end-to-end use. Despite the significant improvements that modern hardware and operating systems have made towards input and output, reading the graph from file systems easily takes thousands of times longer than running the computational kernel itself. This slowdown causes both a disconnect for end users and a loss of productivity for researchers and developers. We close this gap by providing a simple to use, small, header-only, and dependency-free C++11 library, PIGO, that brings I/O improvements to graph and sparse matrix systems. Using PIGO, we improve the end-to-end performance for state-of-the-art systems significantly---in many cases by over 40X.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114362323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fault injection attacks on SoftMax function in deep neural networks 深度神经网络中SoftMax函数的故障注入攻击

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3458870

Dirmanto Jap, Yoo-Seung Won, S. Bhasin

引用次数: 5

Near real-time intrusion alert aggregation using concept-based learning 基于概念学习的近实时入侵警报聚合

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3458663

Gordon Werner, S. Yang, K. McConky

引用次数: 5

Leveraging ML to handle the increasing complexity of the cloud 利用机器学习来处理日益复杂的云

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3460425

Christina Delimitrou

引用次数: 0

Interactive data science at scale 大规模的交互式数据科学

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3459985

David A. Bader

{"title":"Interactive data science at scale","authors":"David A. Bader","doi":"10.1145/3457388.3459985","DOIUrl":"https://doi.org/10.1145/3457388.3459985","url":null,"abstract":"A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. In this talk, we discuss our development of suffix array and graph algorithms in the context of Arkouda, a NumPy-like replacement for interactive data science on tens of terabytes of data. Many real-world applications in bioinformatics, web information search and analysis, and lossless compression can be abstracted as string analysis. Suffix arrays are a very efficient data structure to support quick search of any string patterns. We have integrated the suffix array data structure (including its enhanced Longest Common Prefix (LCP) array) and the corresponding construction algorithm into Arkouda, thus providing Python users with a powerful method to support different types of string analysis. Our novel approach integrates a suffix array algorithm library into Arkouda so that the Arkouda runtime can select the large suffix array construction algorithm dynamically based on the dataset properties. Two of the implemented methods on the back-end of Arkouda include our novel O(n) time complexity skew algorithm in Chapel, and the DivSufSoft suffix array construction algorithm in C, which has higher time complexity but often is faster in practice. Experimental results show that, supported by Arkouda, Python users can build a large scale string's suffix array and LCP array in a Jupyter notebook easily without losing any performance compared with the directly back-end operation. Our future work is extending our self-developed algorithm to support multi-locale parallel execution, so that our algorithm can handle large strings on distributed systems. Graphs are widely used to abstract problems in domains such as social sciences, biological systems, and information systems. To support real-world large graph analysis in Arkouda, we first developed the array-based graph data structure which can be used like an adjacency matrix or incidence matrix but with much less memory. At the same time, it naturally works well with Arkouda's array operators. Based on this succinct graph data structure, we have developed two typical graph algorithms, breadth-first search (BFS) and triangle counting. Both algorithms have been successfully integrated into Arkouda. Both are multi-locale algorithms so they can handle a very large graph on distributed systems. Experimental results of BFS on a 32-node cluster system show that our method can build large graphs into distributed memory and execute the parallel BFS algorithm on typical sparse graph benchmarks and R-MAT generator-based graphs successfully. The performance results show that the distributed graph building time and BFS time will increase linearly with the total number of edges. For future work, we will further optimize these graph algorithms and investigate the streaming versions in Arkouda. We acknowledge Mike Merrill and Bill Reus, the founding developers of the op","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134098068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A golden age for computing frontiers, a dark age for computing education? 是计算机前沿的黄金时代，还是计算机教育的黑暗时代?

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3458673

C. Teuscher

引用次数: 0

Exploring the potential of context-aware dynamic CPU undervolting 探索上下文感知动态CPU欠压的潜力

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3458658

E. Maroudas, S. Lalis, Nikolaos Bellas, C. Antonopoulos

{"title":"Exploring the potential of context-aware dynamic CPU undervolting","authors":"E. Maroudas, S. Lalis, Nikolaos Bellas, C. Antonopoulos","doi":"10.1145/3457388.3458658","DOIUrl":"https://doi.org/10.1145/3457388.3458658","url":null,"abstract":"CPU operation at sub-nominal voltage levels has been researched to reduce the power and energy consumption of computer systems. While it is possible to determine a safe undervolting level for each application, typically only the most conservative setting is applied statically across all workloads. In this paper, we go a step further and investigate the gains that can be achieved by dynamically and transparently changing the level of CPU undervolting at runtime. To enable this functionality, we design and implement a novel, OS-level, context-aware dynamic undervolting mechanism, able to decide and apply voltage levels according to the specific tolerance of each workload that executes on a multicore CPU at a particular time. Our mechanism can further differentiate between the user- and kernel-level code executed within the same application thread, enabling the exploitation of differences in their undervolting potential. User- and kernel-level code have inherently different characteristics, yet in previous work have never been characterized individually. Our experiments, on an Intel x86-64 multicore show that the proposed approach can reduce the average CPU power consumption by 5.58%/30.05% compared to static undervolting and the nominal voltage level, respectively. Finally, we provide indicative estimates for the gains that could be achieved in future CPU architectures with multiple, per-core voltage domains.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129560556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

When wearable technology meets computing in future networks: a road ahead 当可穿戴技术与未来网络中的计算技术相遇时:前方的道路

Proceedings of the 18th ACM International Conference on Computing Frontiers Pub Date : 2021-05-11 DOI: 10.1145/3457388.3458614

A. Ometov, Olga Chukhno, Nadezhda Chukhno, J. Nurmi, E. Lohan

引用次数: 7