Felix Schmitt, R. Dietrich, R. Kuß, J. Doleschal, A. Knüpfer
{"title":"Visualization of Performance Data for MPI Applications Using Circular Hierarchies","authors":"Felix Schmitt, R. Dietrich, R. Kuß, J. Doleschal, A. Knüpfer","doi":"10.1109/VPA.2014.5","DOIUrl":"https://doi.org/10.1109/VPA.2014.5","url":null,"abstract":"One of the challenges for the developer of highly-parallel MPI applications running on distributed high performance computing systems is to understand the complex behavior of their applications. It requires to identify inefficiencies, and to optimize them such that communication waiting times can be reduced. This task can only be accomplished with the help of elaborated tools that provide insight into the details of the application using an automatic analysis or an intuitive visualization approach. While the first can only target a specific problem domain, the latter allows humans to discuss performance problems with a broader view and from multiple perspectives. We present a new visualization technique for performance data of MPI applications based on circular hierarchies. It intuitively presents communication patterns and allows developers to correlate those with arbitrary performance metrics. A hierarchy-aware layout increases scalability and helps to identify communication inefficiencies by analyzing and integrating the system's hardware topology. We discuss both our approach as well as its integration into the Score-P performance analysis work flow. Its applicability is presented with a real-world use case of the COSMO+SPECS+FD4 climate simulation code.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116246088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering Barriers to Efficient Execution, Both Obvious and Subtle, Using Instruction-Level Visualization","authors":"David M. Koppelman, C. J. Michael","doi":"10.1109/VPA.2014.11","DOIUrl":"https://doi.org/10.1109/VPA.2014.11","url":null,"abstract":"CPU performance is determined by the interaction between available resources, microarchitectural features, the execution of instructions, and by the data. These elements can interact in complex ways, making it difficult for those seeing only aggregate performance numbers, such as miss ratios and issue rates, to determine whether there are reasonable avenues for performance improvement. A technique called instruction-level visualization helps users connect these disparate elements by showing the timing of the execution of individual program instructions. The PSE visualization program enhances instruction-level visualization by showing which instructions contribute to execution inefficiency in a way that makes it easy to locate dependent instructions and the history of events affecting the instruction. A simple annotation system makes it easy for a user to attach custom information. PSE has been used for microarchitecture research, simulator debugging, and for instructional use.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127914646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CommGram: A New Visual Analytics Tool for Large Communication Trace Data","authors":"Jieting Wu, Jianping Zeng, Hongfeng Yu, J. Kenny","doi":"10.1109/VPA.2014.8","DOIUrl":"https://doi.org/10.1109/VPA.2014.8","url":null,"abstract":"The performance of massively parallel program is often impacted by the cost of communication across computing nodes. Analysis of communication patterns is critical for understanding and optimizing massively parallel programs. Visualization can help identify potential communication bottlenecks by displaying message trace data. However, the visual clutter and temporal incoherence problems are typically incurred in existing visualization tools for a considerable number of processors. In this paper, we present a new tool, named CommGram, which supports visual analysis of communication patterns for massive parallel MPI programs. With the benefit of MPI trace library DUMPI of SST, our framework builds hierarchical clustering trees for computational community domain, and takes advantage of graphical user interface (GUI) to convey communication patterns at different levels of detail. The effectiveness of our tool is demonstrated using large-scale parallel applications.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125254916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Down to Earth - How to Visualize Traffic on High-dimensional Torus Networks","authors":"Lucas Theisen, A. Shah, F. Wolf","doi":"10.1109/VPA.2014.6","DOIUrl":"https://doi.org/10.1109/VPA.2014.6","url":null,"abstract":"High-dimensional torus networks are becoming common in flagship HPC systems, with five of the top ten systems in June 2014 having networks with more than three dimensions. Although such networks combine performance with scalability at reasonable cost, the challenge of how to achieve optimal performance remains. Tools are needed to help understand how well the traffic is distributed among the many dimensions. This involves not only capturing network traffic but also its comprehensible visualization. However, visualizing such networks requires projecting multiple dimensions onto a two-dimensional screen, which is naturally challenging. To tackle this problem, in this position paper, we propose a visualization technique which can display traffic on torus networks with up to six dimensions. Our fundamental approach is to simultaneously present multiple views of the same network section, with each view visualizing different dimensions. Furthermore, we leverage the multiple-coordinate system concept and combine it with a customized polygon view to provide both a global and a zoomed-in perspective of the network. By interactively linking all the views, our technique makes it possible to analyze how the communication pattern of an application is mapped onto a network.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"264 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131559648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TorusVis^ND: Unraveling High-Dimensional Torus Networks for Network Traffic Visualizations","authors":"Shenghui Cheng, Pradipta De, S. Jiang, K. Mueller","doi":"10.1109/VPA.2014.7","DOIUrl":"https://doi.org/10.1109/VPA.2014.7","url":null,"abstract":"Torus networks are widely used in supercomputing. However, due to their complex topology and their large number of nodes, it is difficult for analysts to perceive the messages flow in these networks. We propose a visualization framework called TorusVisND that uses modern information visualization techniques to allow analysts to see the network and its communication patterns in a single display and control the amount of information shown via filtering in the temporal and the topology domains. For this purpose we provide three cooperating visual interfaces. The main interface is the network display. It uses two alternate graph numbering schemes -- a sequential curve and a Hilbert curve -- to unravel the 5D torus network into a single string of nodes. We then arrange these nodes onto a circle and add the communication links as line bundles in the circle interior. A node selector based on parallel coordinates and a time slicer based on ThemeRiver help users focus on certain processor groups and time slices in the network display. We demonstrate our approach via a small use case.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130358349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Collin M. McCarthy, Katherine E. Isaacs, A. Bhatele, P. Bremer, B. Hamann
{"title":"Visualizing the Five-dimensional Torus Network of the IBM Blue Gene/Q","authors":"Collin M. McCarthy, Katherine E. Isaacs, A. Bhatele, P. Bremer, B. Hamann","doi":"10.1109/VPA.2014.10","DOIUrl":"https://doi.org/10.1109/VPA.2014.10","url":null,"abstract":"Understanding the interactions between a parallel application and the interconnection network over which it exchanges data is critical to optimizing performance in modern supercomputers. However, recent supercomputing architectures use networks that do not have natural low-dimensional representations, making them difficult to comprehend or visualize. In particular, high-dimensional torus networks are common and are used in four of the top ten supercomputers and eight of the top ten on the Graph500 list. We present a new visualization of five-dimensional torus networks. We use four connected views depicting the network at different levels of detail, allowing analysts to observe general large-scale traffic patterns while simultaneously viewing individual links or outliers in any specific section of the network. We demonstrate this approach by analyzing network traffic for a pF3D simulation running on the IBM Blue Gene/Q architecture, and show how it is both intuitive and effective for understanding and optimizing parallel application behavior.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"74 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120935344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Huck, Kristin C. Potter, D. Jacobsen, H. Childs, A. Malony
{"title":"Linking Performance Data into Scientific Visualization Tools","authors":"K. Huck, Kristin C. Potter, D. Jacobsen, H. Childs, A. Malony","doi":"10.1109/VPA.2014.9","DOIUrl":"https://doi.org/10.1109/VPA.2014.9","url":null,"abstract":"Understanding the performance of program execution is essential when optimizing simulations run on high-performance supercomputers. Instrumenting and profiling codes is itself a difficult task and interpreting the resulting complex data is often facilitated through visualization of the gathered measures. However, these measures typically ignore spatial information specific to a simulation, which may contain useful knowledge on program behavior. Linking the instrumentation data to the visualization of performance within a spatial context is not straightforward as information needed to create the visualizations is not, by default, included in data collection, and the typical visualization approaches do not address spatial concerns. In this work, we present an approach that links the collection of spatially-aware performance data to a visualization paradigm through both analysis and visualization abstractions to facilitate better understanding of performance in the spatial context of the simulation. Because the potential costs for such a system are quite high, we leverage existing performance profiling and visualization systems and demonstrate their combined potential on climate simulation.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116109253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Weyers, C. Terboven, Dirk Schmidl, Joachim Herber, T. Kuhlen, Matthias S. Müller, B. Hentschel
{"title":"Visualization of Memory Access Behavior on Hierarchical NUMA Architectures","authors":"B. Weyers, C. Terboven, Dirk Schmidl, Joachim Herber, T. Kuhlen, Matthias S. Müller, B. Hentschel","doi":"10.1109/VPA.2014.12","DOIUrl":"https://doi.org/10.1109/VPA.2014.12","url":null,"abstract":"The available memory bandwidth of existing high performance computing platforms turns out as being more and more the limitation to various applications. Therefore, modern microarchitectures integrate the memory controller on the processor chip, which leads to a non-uniform memory access behavior of such systems. This access behavior in turn entails major challenges in the development of shared memory parallel applications. An improperly implemented memory access functionality results in a bad ratio between local and remote memory access, and causes low performance on such architectures. To address this problem, the developers of such applications rely on tools to make these kinds of performance problems visible. This work presents a new tool for the visualization of performance data of the non-uniform memory access behavior. Because of the visual design of the tool, the developer is able to judge the severity of remote memory access in a time-dependent simulation, which is currently not possible using existing tools.","PeriodicalId":160141,"journal":{"name":"2014 First Workshop on Visual Performance Analysis","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115059661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}