{"title":"Scaling up performance of fat nodes for HPC","authors":"Alejandro Rico","doi":"10.1145/3310273.3325137","DOIUrl":"https://doi.org/10.1145/3310273.3325137","url":null,"abstract":"Future computing systems will integrate an increasing number of compute elements in processors. Such systems must be designed to efficiently scale up and to provide effective synchronization semantics, fast data movement and resource management. At the same time, it is paramount to understand application characteristics to dimension hardware components and interfaces, while adapting the codes to better exploit performance through those features without wasting area or power. This talk will cover multiple technologies targeted to scale up performance of large processors and research insights around synchronization, coherence, bandwidth and resource management, developed during the co-design effort with HPC codes for future systems.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116746854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatially fine-grained air quality prediction based on DBU-LSTM","authors":"Liang Ge, Aoli Zhou, Hang Li, Junling Liu","doi":"10.1145/3310273.3322829","DOIUrl":"https://doi.org/10.1145/3310273.3322829","url":null,"abstract":"This paper proposes a general approach to predict the spatially fine-grained air quality. The model is based on deep bidirectional and unidirectional long short-term memory (DBU-LSTM) neural network, which can capture bidirectional temporal dependencies and spatial correlations from time series data. Urban heterogeneous data such as point of interest (POI) and road network are used to evaluate the similarities between urban regions. The tensor decomposition method is used to complete the missing historical air quality data of monitoring stations. We evaluate our approach on real data sources obtained in Beijing, and the experimental results show its advantages over baseline methods.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122787256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate loop unrolling","authors":"M. Rodriguez-Cancio, B. Combemale, B. Baudry","doi":"10.1145/3310273.3323841","DOIUrl":"https://doi.org/10.1145/3310273.3323841","url":null,"abstract":"We introduce Approximate Unrolling, a compiler loop optimization that reduces execution time and energy consumption, exploiting code regions that can endure some approximation and still produce acceptable results. Specifically, this work focuses on counted loops that map a function over the elements of an array. Approximate Unrolling transforms loops similarly to Loop Unrolling. However, unlike its exact counterpart, our optimization does not unroll loops by adding exact copies of the loop's body. Instead, it adds code that interpolates the results of previous iterations.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126221848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Wang, Fengkai Yuan, Rui Hou, Jingqiang Lin, Z. Ji, Dan Meng
{"title":"CacheGuard: a security-enhanced directory architecture against continuous attacks","authors":"Kai Wang, Fengkai Yuan, Rui Hou, Jingqiang Lin, Z. Ji, Dan Meng","doi":"10.1145/3310273.3323051","DOIUrl":"https://doi.org/10.1145/3310273.3323051","url":null,"abstract":"Modern processor cores share the last-level cache and directory to improve resource utilization. Unfortunately, such sharing makes the cache vulnerable to cross-core cache side channel attacks. Recent studies show that information leakage through cross-core cache side channel attacks is a serious threat in different computing domains ranging from cloud servers and mobile phones to embedded devices. However, previous solutions have limitations of losing performance, lacking golden standards, requiring software support, or being easily bypassed. In this paper, we observe that most cross-core cache side channel attacks cause sensitive data to appear in a ping-pong pattern in continuous attack scenarios, where attackers need to launch numerous attacks in a short period of time. This paper proposes CacheGuard to defend against the continuous attacks. CacheGuard extends the directory architecture for capturing the ping-pong patterns. Once the ping-pong pattern of a cache line is captured, Cache-Guard can secure the line with two pattern-oriented counteractions, Preload and Lock. The experimental evaluation demonstrates that CacheGuard can block the continuous attacks, and that it induces negligible performance degradation and hardware overhead.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131423282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User-centered context-aware CPU/GPU power management for interactive applications on smartphones","authors":"Syuan-Yi Lin, C. King","doi":"10.1145/3310273.3322825","DOIUrl":"https://doi.org/10.1145/3310273.3322825","url":null,"abstract":"CPU/GPU frequency scheduling on smartphones that maintains users' quality of experience (QoE) while reducing power consumption has been studied extensively in the past. Most previous works focused on power-hungry applications such as video streaming or 3D games. However, the majority of people are light to medium users, using applications such as social networking, web browsing, etc. For such interactive applications, it is difficult to reduce power consumption, because their behaviors depend on the user's interactions and are hard to characterize. In this paper, we tackle this challenging problem by considering the influences of user contexts on their interaction behaviors. A context-aware CPU/GPU frequency scheduling governor is proposed that allocates CPU/GPU frequencies just enough to meet the workload under different stages of user interaction. Evaluations show that the proposed governor can save power consumption up to 25% compared to the default governor while keeping the users satisfied with the QoE.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130412449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance statistics and learning based detection of exploitative speculative attacks","authors":"Swastika Dutta, S. Sinha","doi":"10.1145/3310273.3322832","DOIUrl":"https://doi.org/10.1145/3310273.3322832","url":null,"abstract":"Most of the modern processors perform out-of-order speculative executions to maximise system performance. Spectre and Meltdown exploit these optimisations and execute certain instructions leading to leakage of confidential information of the victim. All the variants of this class of attacks necessarily exploit branch prediction or speculative execution. Using this insight, we develop a two step strategy to effectively detect these attacks using performance counter statistics, correlation coefficient model, deep neural network and fast Fourier transform. Our approach is expected to provide reliable, fast and highly accurate results with no perceivable loss in system performance or system overhead.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131931903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating parallel graph computing with speculation","authors":"Shuo Ji, Yinliang Zhao, Qing Yi","doi":"10.1145/3310273.3323049","DOIUrl":"https://doi.org/10.1145/3310273.3323049","url":null,"abstract":"Nowadays distributed graph computing is widely used to process large amount of data on the internet. Communication overhead is a critical factor in determining the overall efficiency of graph algorithms. Through speculative prediction of the content of communications, we develop an optimization technique to significantly reduce the amount of communications needed for a class of graph algorithms. We have evaluated our optimization technique using five graph algorithms, Single-source shortest path, Connected Components, PageRank, Diameter, and Random Walk, on the Amazon EC2 clusters using different graph datasets. Our optimized implementations have reduced communication overhead by 21--93% for these algorithms, while keeping the error rates under 5%.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131587660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bernaschi, Alessandro Celestini, Stefano Guarino, F. Lombardi, Enrico Mastrostefano
{"title":"Analysing the tor web with high performance graph algorithms","authors":"M. Bernaschi, Alessandro Celestini, Stefano Guarino, F. Lombardi, Enrico Mastrostefano","doi":"10.1145/3310273.3323918","DOIUrl":"https://doi.org/10.1145/3310273.3323918","url":null,"abstract":"The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network demand for specific algorithms to explore and analyze it. Tor is an anonymity network that allows offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far the attention of the research community has focused on assessing the security of the Tor infrastructure. Most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize users and services, while little or no information is available about the topology of the Tor Web graph or the relationship between pages' content and topological structure. With our work we aim at addressing such lack of information. We describe the topology of the Tor Web graph measuring both global and local properties by means of well-known metrics that require due to the size of the network, high performance algorithms. We consider three different snapshots obtained by extensively crawling Tor three times over a 5 months time frame. Finally we present a correlation analysis of pages' semantics and topology, discussing novel insights about the Tor Web organization and its content. Our findings show that the Tor graph presents some of the characteristics of social and surface web graphs, along with a few unique peculiarities.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114424502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Strati, Christina Giannoula, Dimitrios Siakavaras, G. Goumas, N. Koziris
{"title":"An adaptive concurrent priority queue for NUMA architectures","authors":"F. Strati, Christina Giannoula, Dimitrios Siakavaras, G. Goumas, N. Koziris","doi":"10.1145/3310273.3323164","DOIUrl":"https://doi.org/10.1145/3310273.3323164","url":null,"abstract":"Designing scalable concurrent priority queues for contemporary NUMA servers is challenging. Several NUMA-unaware implementations can scale up to a high number of threads exploiting the potential parallelism of the insert operations. In contrast, in deleteMin-dominated workloads, threads compete for accessing the same memory locations, i.e. the first item in the priority queue. In such cases, NUMA-aware implementations are typically used, since they reduce the coherence traffic between the nodes of a NUMA system. In this work, we propose an adaptive priority queue, called SmartPQ, that tunes itself by automatically switching between NUMA-unaware and NUMA-aware algorithmic modes to provide the highest available performance under all workloads. SmartPQ is built on top of NUMA Node Delegation (Nuddle), a low overhead technique to construct NUMA-aware data structures using any arbitrary NUMA-unaware implementation as its backbone. Moreover, SmartPQ employs machine learning to decide when to switch between its two algorithmic modes. As our evaluation reveals, it achieves the highest available performance with 88% success rate and dynamically adapts between a NUMA-aware and a NUMA-unaware mode, without overheads, while performing up to 1.83 times better performance than Spraylist, the state-of-the-art NUMA-unaware priority queue.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing a secure DRAM+NVM hybrid memory module","authors":"Xu Wang, I. Koren","doi":"10.1145/3310273.3323069","DOIUrl":"https://doi.org/10.1145/3310273.3323069","url":null,"abstract":"Non-Volatile Memory (NVM) such as PCM has emerged as a potential alternative for main memory due to its high density and low leakage power. However, an NVM main-memory system faces three challenges when compared to Dynamic Random Access Memory (DRAM) - long latency, poor write endurance and data security. To address these three challenges, we propose a secure DRAM+NVM hybrid memory module. The hybrid module integrates a DRAM cache and a security unit (SU). DRAM cache can improve the performance of an NVM memory module and reduce the number of direct writes to the NVM. Our results show that a 256MB 2-way DRAM cache with a 1024B cache line performs well in an 8GB NVM main memory module. The SU is embedded in the onboard controller and includes an AES-GCM engine and an NVM vault. The AES-GCM engine implements encryption and authentication with low overhead. The NVM vault is used to store MAC tags and counter values for each DRAM cache line. According to our results, the proposed secure hybrid memory module improves the performance by 32% compared to an NVM-only memory module, and is only 6.8% slower than a DRAM only memory module.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129158182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}