使用加速堆栈分析托管语言应用程序的可伸缩性

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2017-04-24 DOI:10.1109/ISPASS.2017.7975267

Jennifer B. Sartor, Kristof Du Bois, Stijn Eyerman, L. Eeckhout

{"title":"使用加速堆栈分析托管语言应用程序的可伸缩性","authors":"Jennifer B. Sartor, Kristof Du Bois, Stijn Eyerman, L. Eeckhout","doi":"10.1109/ISPASS.2017.7975267","DOIUrl":null,"url":null,"abstract":"Understanding the reasons why multi-threaded applications do not achieve perfect scaling on modern multicore hardware is challenging. Furthermore, more and more modern programs are written in managed languages, which have extra service threads (e.g., to perform memory management), which may retard scalability and complicate performance analysis. In this paper, we extend speedup stacks, a previously-presented visualization tool to analyze multi-threaded program scalability, to managed applications. Speedup stacks are comprehensive bar graphs that break down an application's execution to explain the main causes of sublinear speedup, i.e., when some threads are not allowing the application to progress, and thus increasing the execution time. We not only expand speedup stacks to analyze how the managed language's service threads affect overall scalability, but also implement speedup stacks while running on native hardware. We monitor the application and service threads' scheduling behavior using light-weight OS kernel modules, incurring under 1% overhead running unmodified Java benchmarks. We add two performance delimiters targeting managed applications: garbage collection and main initialization activities. We analyze the scalability limitations of these benchmarks and the impact of using both a stop-the-world and a concurrent garbage collector with speedup stacks. Our visualization tool facilitates the identification of scalability bottlenecks both between application threads and of service threads, pointing developers to whether optimization should be focused on the language runtime or the application. Speedup stacks provide better program understanding for both program and system designers, which can help optimize multicore processor performance.","PeriodicalId":123307,"journal":{"name":"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing the scalability of managed language applications with speedup stacks\",\"authors\":\"Jennifer B. Sartor, Kristof Du Bois, Stijn Eyerman, L. Eeckhout\",\"doi\":\"10.1109/ISPASS.2017.7975267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the reasons why multi-threaded applications do not achieve perfect scaling on modern multicore hardware is challenging. Furthermore, more and more modern programs are written in managed languages, which have extra service threads (e.g., to perform memory management), which may retard scalability and complicate performance analysis. In this paper, we extend speedup stacks, a previously-presented visualization tool to analyze multi-threaded program scalability, to managed applications. Speedup stacks are comprehensive bar graphs that break down an application's execution to explain the main causes of sublinear speedup, i.e., when some threads are not allowing the application to progress, and thus increasing the execution time. We not only expand speedup stacks to analyze how the managed language's service threads affect overall scalability, but also implement speedup stacks while running on native hardware. We monitor the application and service threads' scheduling behavior using light-weight OS kernel modules, incurring under 1% overhead running unmodified Java benchmarks. We add two performance delimiters targeting managed applications: garbage collection and main initialization activities. We analyze the scalability limitations of these benchmarks and the impact of using both a stop-the-world and a concurrent garbage collector with speedup stacks. Our visualization tool facilitates the identification of scalability bottlenecks both between application threads and of service threads, pointing developers to whether optimization should be focused on the language runtime or the application. Speedup stacks provide better program understanding for both program and system designers, which can help optimize multicore processor performance.\",\"PeriodicalId\":123307,\"journal\":{\"name\":\"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS.2017.7975267\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2017.7975267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

理解多线程应用程序不能在现代多核硬件上实现完美扩展的原因是一项挑战。此外，越来越多的现代程序是用托管语言编写的，这些语言有额外的服务线程(例如，执行内存管理)，这可能会阻碍可伸缩性并使性能分析复杂化。在本文中，我们扩展了加速堆栈，一个以前提出的可视化工具来分析多线程程序的可伸缩性，以管理应用程序。加速堆栈是一个综合的条形图，它分解了应用程序的执行，以解释次线性加速的主要原因，例如，当某些线程不允许应用程序进行时，从而增加了执行时间。我们不仅扩展了加速堆栈来分析托管语言的服务线程如何影响整体可伸缩性，而且还在本机硬件上运行时实现了加速堆栈。我们使用轻量级的OS内核模块监视应用程序和服务线程的调度行为，在运行未经修改的Java基准测试时产生不到1%的开销。我们添加了两个针对托管应用程序的性能分隔符:垃圾收集和主初始化活动。我们分析了这些基准测试的可伸缩性限制，以及同时使用带有加速堆栈的stop-the-world和并发垃圾收集器的影响。我们的可视化工具有助于识别应用程序线程和服务线程之间的可伸缩性瓶颈，为开发人员指出优化应该集中在语言运行时还是应用程序上。加速堆栈为程序和系统设计人员提供了更好的程序理解，这有助于优化多核处理器的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analyzing the scalability of managed language applications with speedup stacks

Understanding the reasons why multi-threaded applications do not achieve perfect scaling on modern multicore hardware is challenging. Furthermore, more and more modern programs are written in managed languages, which have extra service threads (e.g., to perform memory management), which may retard scalability and complicate performance analysis. In this paper, we extend speedup stacks, a previously-presented visualization tool to analyze multi-threaded program scalability, to managed applications. Speedup stacks are comprehensive bar graphs that break down an application's execution to explain the main causes of sublinear speedup, i.e., when some threads are not allowing the application to progress, and thus increasing the execution time. We not only expand speedup stacks to analyze how the managed language's service threads affect overall scalability, but also implement speedup stacks while running on native hardware. We monitor the application and service threads' scheduling behavior using light-weight OS kernel modules, incurring under 1% overhead running unmodified Java benchmarks. We add two performance delimiters targeting managed applications: garbage collection and main initialization activities. We analyze the scalability limitations of these benchmarks and the impact of using both a stop-the-world and a concurrent garbage collector with speedup stacks. Our visualization tool facilitates the identification of scalability bottlenecks both between application threads and of service threads, pointing developers to whether optimization should be focused on the language runtime or the application. Speedup stacks provide better program understanding for both program and system designers, which can help optimize multicore processor performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量