Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems最新文献_第2页

Session details: Session 1B: Managed Runtimes and Dynamic Translation 会话1B:管理运行时和动态翻译

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3252953

Lei Liu

引用次数: 0

Static Detection of Event-based Races in Android Apps Android应用中基于事件的竞赛的静态检测

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3173162.3173173

Yongjian Hu, Iulian Neamtiu

{"title":"Static Detection of Event-based Races in Android Apps","authors":"Yongjian Hu, Iulian Neamtiu","doi":"10.1145/3173162.3173173","DOIUrl":"https://doi.org/10.1145/3173162.3173173","url":null,"abstract":"Event-based races are the main source of concurrency errors in Android apps. Prior approaches for scalable detection of event-based races have been dynamic. Due to their dynamic nature, these approaches suffer from coverage and false negative issues. We introduce a precise and scalable static approach and tool, named SIERRA, for detecting Android event-based races. SIERRA is centered around a new concept of \"concurrency action\" (that reifies threads, events/messages, system and user actions) and statically-derived order (happens-before relation) between actions. Establishing action order is complicated in Android, and event-based systems in general, because of externally-orchestrated control flow, use of callbacks, asynchronous tasks, and ad-hoc synchronization. We introduce several novel approaches that enable us to infer order relations statically: auto-generated code models which impose order among lifecycle and GUI events; a novel context abstraction for event-driven programs named action-sensitivity and finally, on-demand path sensitivity via backward symbolic execution to further rule out false positives. We have evaluated SIERRA on 194 Android apps. Of these, we chose 20 apps for manual analysis and comparison with a state-of-the-art dynamic race detector. Experimental results show that SIERRA is effective and efficient, typically taking 960 seconds to analyze an app and revealing 43 potential races. Compared with the dynamic race detector, SIERRA discovered an average 29.5 true races with 3.5 false positives, where the dynamic detector only discovered 4 races (hence missing 25.5 races per app) -- this demonstrates the advantage of a precise static approach. We believe that our approach opens the way for precise analysis and static event race detection in other event-driven systems beyond Android.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131717051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

In-Memory Data Parallel Processor 内存数据并行处理器

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3173162.3173171

Daichi Fujiki, S. Mahlke, R. Das

{"title":"In-Memory Data Parallel Processor","authors":"Daichi Fujiki, S. Mahlke, R. Das","doi":"10.1145/3173162.3173171","DOIUrl":"https://doi.org/10.1145/3173162.3173171","url":null,"abstract":"Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our in-memory processor. Our results demonstrate 7.5x speedup over a multi-core CPU server for a set of applications from Parsec and 763x speedup over a server-class GPU for a set of Rodinia benchmarks.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133579818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 114

Blasting through the Front-End Bottleneck with Shotgun 用散弹枪爆破前端瓶颈

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3173162.3173178

Rakesh Kumar, Boris Grot, V. Nagarajan

{"title":"Blasting through the Front-End Bottleneck with Shotgun","authors":"Rakesh Kumar, Boris Grot, V. Nagarajan","doi":"10.1145/3173162.3173178","DOIUrl":"https://doi.org/10.1145/3173162.3173178","url":null,"abstract":"The front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction working sets. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art techniques force a trade-off between performance and metadata storage costs. This work introduces Shotgun, a BTB-directed front-end prefetcher powered by a new BTB organization that maintains a logical map of an application's instruction footprint, which enables high-efficacy prefetching at low storage cost. To map active code regions, Shotgun precisely tracks an application's global control flow (e.g., function and trap routine entry points) and summarizes local control flow within each code region. Because the local control flow enjoys high spatial locality, with most functions comprised of a handful of instruction cache blocks, it lends itself to a compact region-based encoding. Meanwhile, the global control flow is naturally captured by the application's unconditional branch working set (calls, returns, traps). Based on these insights, Shotgun devotes the bulk of its BTB capacity to branches responsible for the global control flow and a spatial encoding of their target regions. By effectively capturing a map of the application's instruction footprint in the BTB, Shotgun enables highly effective BTB-directed prefetching. Using a storage budget equivalent to a conventional BTB, Shotgun outperforms the state-of-the-art BTB-directed front-end prefetcher by up to 14% on a set of varied commercial workloads.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124887134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs 光泽:无缝现场重新配置和流程序的重新优化

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3173162.3173170

S. Rajadurai, Jeffrey Bosboom, W. Wong, Saman P. Amarasinghe

{"title":"Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs","authors":"S. Rajadurai, Jeffrey Bosboom, W. Wong, Saman P. Amarasinghe","doi":"10.1145/3173162.3173170","DOIUrl":"https://doi.org/10.1145/3173162.3173170","url":null,"abstract":"An important class of applications computes on long-running or infinite streams of data, often with known fixed data rates. The latter is referred to as synchronous data flow ~(SDF) streams. These stream applications need to run on clusters or the cloud due to the high performance requirement. Further, they require live reconfiguration and reoptimization for various reasons such as hardware maintenance, elastic computation, or to respond to fluctuations in resources or application workload. However, reconfiguration and reoptimization without downtime while accurately preserving program state in a distributed environment is difficult. In this paper, we introduce Gloss, a suite of compiler and runtime techniques for live reconfiguration of distributed stream programs. Gloss, for the first time, avoids periods of zero throughput during the reconfiguration of both stateless and stateful SDF based stream programs. Furthermore, unlike other systems, Gloss globally reoptimizes and completely recompiles the program during reconfiguration. This permits it to reoptimize the application for entirely new configurations that it may not have encountered before. All these Gloss operations happen in-situ, requiring no extra hardware resources. We show how Gloss allows stream programs to reconfigure and reoptimize with no downtime and minimal overhead, and demonstrate the wider applicability of it via a variety of experiments.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121161008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules 通过自动学习翻译规则增强跨isa DBT

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3173162.3177160

Wenwen Wang, Stephen McCamant, Antonia Zhai, P. Yew

引用次数: 16

Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching Minnow:用于工作列表管理和工作列表定向预取的轻量级卸载引擎

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3173162.3173197

Dan Zhang, Xiaoyu Ma, Michael Thomson, Derek Chiou

{"title":"Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching","authors":"Dan Zhang, Xiaoyu Ma, Michael Thomson, Derek Chiou","doi":"10.1145/3173162.3173197","DOIUrl":"https://doi.org/10.1145/3173162.3173197","url":null,"abstract":"The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low compute intensity, frequent synchronization, uneven task sizes, and dynamic task generation. At high thread counts, execution time is dominated by worklist synchronization overhead and cache misses. Researchers have proposed hardware worklist accelerators to address scheduling costs, but these proposals often harden a specific scheduling policy and do not address high cache miss rates. We address this with Minnow, a technique that augments each core in a CMP with a lightweight Minnow accelerator. Minnow engines offload worklist scheduling from worker threads to improve scalability. The engines also perform worklist-directed prefetching, a technique that exploits knowledge of upcoming tasks to issue nearly perfectly accurate and timely prefetch operations. On a simulated 64-core CMP running a parallel graph benchmark suite, Minnow improves scalability and reduces L2 cache misses from 29 to 1.2 MPKI on average, resulting in 6.01x average speedup over an optimized software baseline for only 1% area overhead.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128405777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Session details: Session 4B: Program Analysis 会话详细信息:会话4B:程序分析

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3252959

Shan Lu

引用次数: 0

NEOFog: Nonvolatility-Exploiting Optimizations for Fog Computing NEOFog:雾计算的非易失性优化

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3173162.3177154

Kaisheng Ma, Xueqing Li, M. Kandemir, J. Sampson, N. Vijaykrishnan, Jinyang Li, Tongda Wu, Zhibo Wang, Yongpan Liu, Yuan Xie

{"title":"NEOFog: Nonvolatility-Exploiting Optimizations for Fog Computing","authors":"Kaisheng Ma, Xueqing Li, M. Kandemir, J. Sampson, N. Vijaykrishnan, Jinyang Li, Tongda Wu, Zhibo Wang, Yongpan Liu, Yuan Xie","doi":"10.1145/3173162.3177154","DOIUrl":"https://doi.org/10.1145/3173162.3177154","url":null,"abstract":"Nonvolatile processors have emerged as one of the promising solutions for energy harvesting scenarios, among which Wireless Sensor Networks (WSN) provide some of the most important applications. In a typical distributed sensing system, due to difference in location, energy harvester angles, power sources, etc. different nodes may have different amount of energy ready for use. While prior approaches have examined these challenges, they have not done so in the context of the features offered by nonvolatile computing approaches, which disrupt certain foundational assumptions. We propose a new set of nonvolatility-exploiting optimizations and embody them in the NEOFog system architecture. We discuss shifts in the tradeoffs in data and program distribution for nonvolatile processing-based WSNs, showing how non-volatile processing and non-volatile RF support alter the benefits of computation and communication-centric approaches. We also propose a new algorithm specific to nonvolatile sensing systems for load balancing both computation and communication demands. Collectively, the NV-aware optimizations in NEOFog increase the ability to perform in-fog processing by 4.2X and can increase this to 8X if virtualized nodes are 3X multiplexed.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121058990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Session details: Session 7B: Memory 2 会话详细信息:会话7B:内存2

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI: 10.1145/3252965

S. Blackburn

引用次数: 0