Pirah Noor Soomro, M. Abduljabbar, J. Castrillón, M. Pericàs
{"title":"An online guided tuning approach to run CNN pipelines on edge devices","authors":"Pirah Noor Soomro, M. Abduljabbar, J. Castrillón, M. Pericàs","doi":"10.1145/3457388.3458662","DOIUrl":"https://doi.org/10.1145/3457388.3458662","url":null,"abstract":"Modern edge and mobile devices are equipped with powerful computing resources. These are often organized as heterogeneous multi-cores, featuring performance-asymmetric core clusters. This raises the question on how to effectively execute the inference pass of convolutional neural networks (CNN) on such devices. Existing CNN implementations on edge devices leverage offline profiling data to determine a better schedule for CNN applications. This approach requires a time consuming phase of generating a performance profile for each type of representative kernel on various core configurations available on the device, coupled with a search space exploration. We propose an online tuning technique which utilizes compile time hints and online profiling data to generate high throughput CNN pipelines. We explore core heterogeneity and compatible core-layer configurations through an online guided search. Unlike exhaustive search, we adopt an evolutionary approach with a guided starting point in order to find the solution. We show that by pruning and navigating through the complex search space using compile time hints, 79% of the tested configurations turn out to be near-optimal candidates for a throughput maximizing pipeline on NVIDIA Jetson TX2 platform.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121362286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic row activation mechanism for multi-core systems","authors":"Tareq A. Alawneh, R. Kirner, C. Menon","doi":"10.1145/3457388.3458660","DOIUrl":"https://doi.org/10.1145/3457388.3458660","url":null,"abstract":"The power that stems from modern DRAM devices represents a significant portion of the overall system power in modern computing systems. In multi-core systems, the competing cores share the same memory banks. The memory contention between these cores may lead to activate a large DRAM row only to access a small portion of data. This row over-fetching problem wastes a significant DRAM activation power with a slight performance gain. In this paper, we propose a dynamic row activation mechanism, in which the optimal size of DRAM rows is detected at run-time based on monitoring the behavioural changes of the memory requests in accessing sub-rows. The proposed mechanism aims at providing significant memory power savings, reducing the average memory access latency, and maintaining the full DRAM bandwidth. Our experimental results using four-core multi-programming workloads revealed that the proposed mechanism in this study can achieve both significant memory power reduction and average DRAM memory access latency improvement with negligible area overhead.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125504851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdurrahman Yasar, Kasimir Gabert, Ümit V. Çatalyürek
{"title":"Parallel graph algorithms by blocks: from I/O to algorithms","authors":"Abdurrahman Yasar, Kasimir Gabert, Ümit V. Çatalyürek","doi":"10.1145/3457388.3459987","DOIUrl":"https://doi.org/10.1145/3457388.3459987","url":null,"abstract":"In today's data-driven world and heterogeneous computing environments, processing large-scale graphs in an architecture agnostic manner has become more crucial than ever before. In terms of graph analytics frameworks, on the one side, there has been a significant interest in developing hand-optimized high-performance computing solutions. On the systems side, following the big data movement and to bring parallel computing to the masses, researchers have proposed several graph processing and management systems to handle large-scale graphs. Hand optimized HPC approaches require high expertise and are expensive to maintain and develop, and graph processing frameworks suffer from limited expressibility and performance. We propose Parallel Graph Algorithms by Blocks (PGAbB), a block-based graph algorithms framework for shared-memory, multi-core, multi-GPU machines. PGAbB offers a sweet spot between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems while performing close to hand-optimized HPC implementations. While our PGAbB framework, as well as many other recent HPC graph-analytics frameworks, are highly tuned and able to run complex graph analytics in fractions of seconds on billion-edge graphs, there remains a gap in their end-to-end use. Despite the significant improvements that modern hardware and operating systems have made towards input and output, reading the graph from file systems easily takes thousands of times longer than running the computational kernel itself. This slowdown causes both a disconnect for end users and a loss of productivity for researchers and developers. We close this gap by providing a simple to use, small, header-only, and dependency-free C++11 library, PIGO, that brings I/O improvements to graph and sparse matrix systems. Using PIGO, we improve the end-to-end performance for state-of-the-art systems significantly---in many cases by over 40X.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114362323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault injection attacks on SoftMax function in deep neural networks","authors":"Dirmanto Jap, Yoo-Seung Won, S. Bhasin","doi":"10.1145/3457388.3458870","DOIUrl":"https://doi.org/10.1145/3457388.3458870","url":null,"abstract":"Softmax is commonly used activation function in neural networks to normalize the output to probability distribution over predicted classes. Being often deployed in the output layer, it can potentially be targeted by fault injection attacks to create misclassification. In this extended abstract, we perform a preliminary fault analysis of Softmax against single bit faults.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133583994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Near real-time intrusion alert aggregation using concept-based learning","authors":"Gordon Werner, S. Yang, K. McConky","doi":"10.1145/3457388.3458663","DOIUrl":"https://doi.org/10.1145/3457388.3458663","url":null,"abstract":"Intrusion detection systems generate a large number of streaming alerts. It can be overwhelming for analysts to quickly and effectively find related alerts stemmed from correlated attack actions. What if fast arriving alerts could be automatically processed with no prior knowledge to find related actions in near real-time? The Concept Learning for Intrusion Event Aggregation in Realtime (CLEAR) system aims to learn and update an evolving set of temporal 'concepts,' each consisting of aggregates of related alerts that exhibit similar statistical arrival patterns. With no training data, the system constructs the concepts in near real-time from statistically similar alert aggregates. Tracked concepts are then applied to incoming alerts for fast and high-fidelity aggregation. The concepts learned by CLEAR are significantly more unique and invariant when compared to those learned by alternative drift detection methods. Furthermore, it provides insights for how specific individual, or co-occuring, alerts arrive with distinct and consistent temporal patterns.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128455903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging ML to handle the increasing complexity of the cloud","authors":"Christina Delimitrou","doi":"10.1145/3457388.3460425","DOIUrl":"https://doi.org/10.1145/3457388.3460425","url":null,"abstract":"Cloud services are increasingly adopting new programming models, such as microservices and serverless compute. While these frameworks offer several advantages, such as better modularity, ease of maintenance and deployment, they also introduce new hardware and software challenges. In this talk, I will briefly discuss the challenges that these new cloud models introduce in hardware and software, and present some of of our work on employing ML to improve the cloud's performance predictability and resource efficiency. I will first discuss Seer, a performance debugging system that identifies root causes of unpredictable performance in multi-tier interactive microservices, and Sage, which improves on Seer by taking a completely unsupervised learning approach to data-driven performance debugging, making it both practical and scalable.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123108635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive data science at scale","authors":"David A. Bader","doi":"10.1145/3457388.3459985","DOIUrl":"https://doi.org/10.1145/3457388.3459985","url":null,"abstract":"A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. In this talk, we discuss our development of suffix array and graph algorithms in the context of Arkouda, a NumPy-like replacement for interactive data science on tens of terabytes of data. Many real-world applications in bioinformatics, web information search and analysis, and lossless compression can be abstracted as string analysis. Suffix arrays are a very efficient data structure to support quick search of any string patterns. We have integrated the suffix array data structure (including its enhanced Longest Common Prefix (LCP) array) and the corresponding construction algorithm into Arkouda, thus providing Python users with a powerful method to support different types of string analysis. Our novel approach integrates a suffix array algorithm library into Arkouda so that the Arkouda runtime can select the large suffix array construction algorithm dynamically based on the dataset properties. Two of the implemented methods on the back-end of Arkouda include our novel O(n) time complexity skew algorithm in Chapel, and the DivSufSoft suffix array construction algorithm in C, which has higher time complexity but often is faster in practice. Experimental results show that, supported by Arkouda, Python users can build a large scale string's suffix array and LCP array in a Jupyter notebook easily without losing any performance compared with the directly back-end operation. Our future work is extending our self-developed algorithm to support multi-locale parallel execution, so that our algorithm can handle large strings on distributed systems. Graphs are widely used to abstract problems in domains such as social sciences, biological systems, and information systems. To support real-world large graph analysis in Arkouda, we first developed the array-based graph data structure which can be used like an adjacency matrix or incidence matrix but with much less memory. At the same time, it naturally works well with Arkouda's array operators. Based on this succinct graph data structure, we have developed two typical graph algorithms, breadth-first search (BFS) and triangle counting. Both algorithms have been successfully integrated into Arkouda. Both are multi-locale algorithms so they can handle a very large graph on distributed systems. Experimental results of BFS on a 32-node cluster system show that our method can build large graphs into distributed memory and execute the parallel BFS algorithm on typical sparse graph benchmarks and R-MAT generator-based graphs successfully. The performance results show that the distributed graph building time and BFS time will increase linearly with the total number of edges. For future work, we will further optimize these graph algorithms and investigate the streaming versions in Arkouda. We acknowledge Mike Merrill and Bill Reus, the founding developers of the op","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134098068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A golden age for computing frontiers, a dark age for computing education?","authors":"C. Teuscher","doi":"10.1145/3457388.3458673","DOIUrl":"https://doi.org/10.1145/3457388.3458673","url":null,"abstract":"There is no doubt that the body of knowledge spanned by the computing disciplines has gone through an unprecedented expansion, both in depth and breadth, over the last century. In this position paper, we argue that this expansion has led to a crisis in computing education: quite literally the vast majority of the topics of interest of this conference are not taught at the undergraduate level and most graduate courses will only scratch the surface of a few selected topics. But alas, industry is increasingly expecting students to be familiar with emerging topics, such as neuromorphic, probabilistic, and quantum computing, AI, and deep learning. We provide evidence for the rapid growth of emerging topics, highlight the decline of traditional areas, muse about the failure of higher education to adapt quickly, and delineate possible ways to avert the crisis by looking at how the field of physics dealt with significant expansions over the last centuries.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126533932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Maroudas, S. Lalis, Nikolaos Bellas, C. Antonopoulos
{"title":"Exploring the potential of context-aware dynamic CPU undervolting","authors":"E. Maroudas, S. Lalis, Nikolaos Bellas, C. Antonopoulos","doi":"10.1145/3457388.3458658","DOIUrl":"https://doi.org/10.1145/3457388.3458658","url":null,"abstract":"CPU operation at sub-nominal voltage levels has been researched to reduce the power and energy consumption of computer systems. While it is possible to determine a safe undervolting level for each application, typically only the most conservative setting is applied statically across all workloads. In this paper, we go a step further and investigate the gains that can be achieved by dynamically and transparently changing the level of CPU undervolting at runtime. To enable this functionality, we design and implement a novel, OS-level, context-aware dynamic undervolting mechanism, able to decide and apply voltage levels according to the specific tolerance of each workload that executes on a multicore CPU at a particular time. Our mechanism can further differentiate between the user- and kernel-level code executed within the same application thread, enabling the exploitation of differences in their undervolting potential. User- and kernel-level code have inherently different characteristics, yet in previous work have never been characterized individually. Our experiments, on an Intel x86-64 multicore show that the proposed approach can reduce the average CPU power consumption by 5.58%/30.05% compared to static undervolting and the nominal voltage level, respectively. Finally, we provide indicative estimates for the gains that could be achieved in future CPU architectures with multiple, per-core voltage domains.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129560556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Ometov, Olga Chukhno, Nadezhda Chukhno, J. Nurmi, E. Lohan
{"title":"When wearable technology meets computing in future networks: a road ahead","authors":"A. Ometov, Olga Chukhno, Nadezhda Chukhno, J. Nurmi, E. Lohan","doi":"10.1145/3457388.3458614","DOIUrl":"https://doi.org/10.1145/3457388.3458614","url":null,"abstract":"Rapid technology advancement, economic growth, and industrialization have paved the way for developing a new niche of small body-worn personal devices, gathered together under a wearable-technology title. The triggers stimulated by end-users interest have introduced the first generation of mass-consumer wearables in just the past decade. Evidently, the trailblazing ones were not designed with strict energy-consumption restrictions in mind. Thus, wearable-computing-related research remained fragmented. Advanced and sophisticated batteries and communication technologies could be already procurable on devices. Additional solutions for efficient utilization of processing power are still a white spot on the wearable technology roadmap. A-WEAR EU project aims to enhance the understanding of how the superimposition of those technologies would improve wearable devices' energy efficiency, with the research area being far from saturation. We foresee enormous room for research as the Edge computing paradigm is emerging towards hand-held devices.","PeriodicalId":136482,"journal":{"name":"Proceedings of the 18th ACM International Conference on Computing Frontiers","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115133847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}