HUST '15Pub Date : 2015-11-15DOI: 10.1145/2834996.2834998
R. McLay, D. James, Si Liu, Todd Evans, B. Barth, A. Lamas-Linares, R. Budiardja, M. Fahey
{"title":"Tales from the trenches: can user support tools make a difference?","authors":"R. McLay, D. James, Si Liu, Todd Evans, B. Barth, A. Lamas-Linares, R. Budiardja, M. Fahey","doi":"10.1145/2834996.2834998","DOIUrl":"https://doi.org/10.1145/2834996.2834998","url":null,"abstract":"Those participating in HUST '15 appreciate the potential value of high quality tools and technologies intended to enhance the user experience on large-scale computers. Given the changing landscape of large-scale computing, they likely also feel a renewed sense of urgency associated with the emerging needs of an increasingly diverse user base. Do user support tools, however, really make a difference for users and those who support them? Are they making a difference now? This paper is one response to these questions: it describes five brief case studies in which the authors employed software tools to help address problems and issues associated with users' needs on the Stampede supercomputer at the Texas Advanced Computing Center. The paper highlights the authors' thoughts on their experiences, including a few deliberately provocative assertions intended to invite reflection and discussion.","PeriodicalId":428233,"journal":{"name":"HUST '15","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121403025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HUST '15Pub Date : 2015-11-15DOI: 10.1145/2834996.2834999
C. Rosales, A. Gómez-Iglesias, Andrew Predoehl
{"title":"Remora: a resource monitoring tool for everyone","authors":"C. Rosales, A. Gómez-Iglesias, Andrew Predoehl","doi":"10.1145/2834996.2834999","DOIUrl":"https://doi.org/10.1145/2834996.2834999","url":null,"abstract":"Knowing about the requirements of HPC applications is a common question that users of high performance systems ask often. However, answering this question requires the collaboration of administrators and sometimes the answer does not contain the amount of detail that users demand. This work introduces a new user space resource monitoring tool, REMORA. REMORA stands for REsource MOnitoring for Remote Applications, and provides a simple interface to gather important system utilization data while running on HPC systems. REMORA is designed to provide a brief text report and post-processing tools to analyze the very detailed records taken during an application run. Users can configure the tool to achieve the amount of detail that they want and perform the analysis of the results at any point in time. REMORA helps users achieving a better understanding of their applications by providing a high level profile of their executions and users can take advantage of that information to improve their codes.","PeriodicalId":428233,"journal":{"name":"HUST '15","volume":"64 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121006203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HUST '15Pub Date : 2015-11-15DOI: 10.1145/2834996.2835000
R. Budiardja, M. Fahey, R. McLay, Prasad Maddumage Don, B. Hadri, D. James
{"title":"Community use of XALT in its first year in production","authors":"R. Budiardja, M. Fahey, R. McLay, Prasad Maddumage Don, B. Hadri, D. James","doi":"10.1145/2834996.2835000","DOIUrl":"https://doi.org/10.1145/2834996.2835000","url":null,"abstract":"XALT collects accurate, detailed, and continuous job-level and link-time data and stores that data in a database; all the data collection is transparent to the users. The data stored can be mined to generate a picture of the compilers, libraries, and other software that users need to run their jobs successfully, highlighting the products that researchers use. We showcase how data collected by XALT can be easily mined into a digestible format by presenting data from four separate HPC centers. XALT is already used by many HPC centers around the world due to its usefulness and complementariness to existing logs and databases. Centers with XALT have a much better understanding of library and executable usage and patterns. We also present new functionality in XALT - namely the ability to anonymize data and early work in providing seamless access to provenance data.","PeriodicalId":428233,"journal":{"name":"HUST '15","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129818429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HUST '15Pub Date : 2015-11-15DOI: 10.1145/2834996.2835001
Erich Birngruber, Petar Forai, Aaron Zauner
{"title":"Total recall: holistic metrics for broad systems performance and user experience visibility in a data-intensive computing environment","authors":"Erich Birngruber, Petar Forai, Aaron Zauner","doi":"10.1145/2834996.2835001","DOIUrl":"https://doi.org/10.1145/2834996.2835001","url":null,"abstract":"User support personnel, systems engineers, and administrators of HPC installations need to be aware of log and telemetry information from different systems in order to perform routine tasks ranging from systems management to user inquiries. We present an integrated, distributed HPC tailored monitoring system, based on a current generation software stack from the DevOps community, with integration into the work load management system. The goal of this system is to provide a quicker turnaround time for user inquiries in response to errors. Dashboards provide an overlay of system and node level events on top of correlated metrics data. This information is directly available for querying, manipulation, and filtering, allowing statistical analysis and aggregation of collected data. Furthermore, additional dashboards offer in-sight into how users are interacting with available resources and pin-point fluctuations in utilization. The system can integrate sources of information from other monitoring solutions and event-based sources.","PeriodicalId":428233,"journal":{"name":"HUST '15","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130964130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HUST '15Pub Date : 2015-11-15DOI: 10.1145/2834996.2834997
Paul Z. Kolano
{"title":"Automatically encapsulating HPC best practices into data transfers","authors":"Paul Z. Kolano","doi":"10.1145/2834996.2834997","DOIUrl":"https://doi.org/10.1145/2834996.2834997","url":null,"abstract":"This paper presents the Shift automated transfer tool and the mechanisms it employs to achieve better performance while preserving the stability of HPC environments. Shift encapsulates best practices understood by domain experts during transfers so that scientists can focus on their science without the need to study file transports, resource management, and file systems as well. Shift understands how to utilize the variety of transports that might be deployed throughout a widely distributed user base, how to maximize the performance achievable by each, and the scenarios in which each is most effective. Shift understands which resources are available in a particular HPC environment and how to utilize them for significant performance increases while preventing resource exhaustion. Finally, Shift understands the file systems to which and from which files may be transferred and the nuances to their use that affect performance and stability behind the scenes.","PeriodicalId":428233,"journal":{"name":"HUST '15","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129127155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}