Extending LDMS to Enable Performance Monitoring in Multi-core Applications

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI:10.1109/CLUSTER.2015.125

S. Feldman, Deli Zhang, D. Dechev, J. Brandt

{"title":"Extending LDMS to Enable Performance Monitoring in Multi-core Applications","authors":"S. Feldman, Deli Zhang, D. Dechev, J. Brandt","doi":"10.1109/CLUSTER.2015.125","DOIUrl":null,"url":null,"abstract":"Identifying design patterns that limit the performance of multi-core algorithms is a challenging task. There are many known methods by which threads synchronize their actions and each method may exhibit different behavior in different use cases. These use cases may vary in regards to the workload being executed, number of parallel tasks, dependencies between these tasks, and the behavior of the system scheduler. Restructuring algorithms to overcome performance limitations requires intimate knowledge on how these algorithms utilize the hardware. In our experience, we have found a lack of adequate tools to gain such knowledge. To address this, we have enhanced and implemented additional data sampler modules for OVIS's Lightweight Distributed Metric Service (LDMS) to enable scalable distributed collection of hardware performance counter data. These modules provide an interface by which LDMS can utilize the PAPI library, Linux perf tools, and RAPL to collect hardware performance data of interest. Using these samplers, we plan to monitor the intra-node behavior, including contention for node level shared resources, of multi-core applications for a diverse set of use cases. We are currently exploring how the values reported are affected by the level of concurrency, the synchronization methodologies, and progress guarantees. We hope to use this information to identify ways to restructure algorithms to increase their performance.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Identifying design patterns that limit the performance of multi-core algorithms is a challenging task. There are many known methods by which threads synchronize their actions and each method may exhibit different behavior in different use cases. These use cases may vary in regards to the workload being executed, number of parallel tasks, dependencies between these tasks, and the behavior of the system scheduler. Restructuring algorithms to overcome performance limitations requires intimate knowledge on how these algorithms utilize the hardware. In our experience, we have found a lack of adequate tools to gain such knowledge. To address this, we have enhanced and implemented additional data sampler modules for OVIS's Lightweight Distributed Metric Service (LDMS) to enable scalable distributed collection of hardware performance counter data. These modules provide an interface by which LDMS can utilize the PAPI library, Linux perf tools, and RAPL to collect hardware performance data of interest. Using these samplers, we plan to monitor the intra-node behavior, including contention for node level shared resources, of multi-core applications for a diverse set of use cases. We are currently exploring how the values reported are affected by the level of concurrency, the synchronization methodologies, and progress guarantees. We hope to use this information to identify ways to restructure algorithms to increase their performance.

查看原文本刊更多论文

扩展LDMS以支持多核应用程序中的性能监控

识别限制多核算法性能的设计模式是一项具有挑战性的任务。有许多已知的方法，线程通过这些方法同步它们的动作，每个方法在不同的用例中可能表现出不同的行为。这些用例可能因正在执行的工作负载、并行任务的数量、这些任务之间的依赖关系以及系统调度器的行为而有所不同。重构算法以克服性能限制需要对这些算法如何利用硬件有深入的了解。根据我们的经验，我们发现缺乏获得这类知识的适当工具。为了解决这个问题，我们为OVIS的轻量级分布式度量服务(LDMS)增强并实现了额外的数据采样器模块，以支持可扩展的硬件性能计数器数据的分布式收集。这些模块提供了一个接口，通过这个接口，LDMS可以利用PAPI库、Linux性能工具和RAPL来收集感兴趣的硬件性能数据。使用这些采样器，我们计划监控多核应用程序的节点内行为，包括对节点级共享资源的争用，以满足不同的用例集。我们目前正在研究报告的值如何受到并发级别、同步方法和进度保证的影响。我们希望利用这些信息来确定重构算法以提高其性能的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量