Rajesh Kalyanam, Lan Zhao, Carol X. Song, Yuet Ling Wong, Jaewoo Lee, Nelson B. Villoria
{"title":"iData: a community geospatial data sharing environment to support data-driven science","authors":"Rajesh Kalyanam, Lan Zhao, Carol X. Song, Yuet Ling Wong, Jaewoo Lee, Nelson B. Villoria","doi":"10.1145/2484762.2484813","DOIUrl":"https://doi.org/10.1145/2484762.2484813","url":null,"abstract":"With the advent of XSEDE, the national cyberinfrastructure has evolved from a set of traditional HPC resources to a broader range of digital services. Science gateways, which serve as portals to scientific applications, have also evolved as researchers are dealing with rapidly expanding scientific datasets and the increasingly complex workflows. More and more gateways are being developed to support integrated services for running data-driven applications on HPC resources such as those on XSEDE. To facilitate this type of workflow, there is a pressing need for web-based data management systems that are easy to use, support data upload, sharing, access and management, and can be integrated with advanced computation and storage resources. More importantly such systems need to be accessible by users from the broad research and education communities. In this paper, we describe the design and implementation of iData, a web-based community data publishing and sharing system. iData supports both generic file-based data collections and several commonly used environmental data collection formats including time series, GIS vector and raster data. Integrated data processing, visualization and filtering capabilities are provided for these data formats. Currently iData can be downloaded and deployed in a HUBzero-based gateway, and we plan to make it available for non-HUBzero platforms in the future. We present two examples in which iData has been successfully used to support research collaboration in driNET and GEOSHARE projects.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134068024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing utilization across XSEDE platforms","authors":"Haihang You, Charng-Da Lu, Ziliang Zhao, Fei Xing","doi":"10.1145/2484762.2484778","DOIUrl":"https://doi.org/10.1145/2484762.2484778","url":null,"abstract":"HPC resources provided by XSEDE give researchers unique opportunities to carry out scientific studies. As of 2013 XSEDE consists of 16 systems with varied architectural designs and capabilities. The hardware heterogeneity and software diversity make efficient utilization of such a federation of computing resources very challenging. For example, users are constantly faced with a myriad of possibilities to build and run an application: compilers, numerical libraries, and runtime parameters. In this paper we report performance data of several popular scientific applications built with different compilers and numerical libraries available on two XSEDE systems: Kraken and Gordon, and suggest the best way to compile applications for optimal performance. By comparison, we validate SU conversion factors between the aforementioned XSEDE systems from application's viewpoint.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114238103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingyu Meng, A. Humphrey, John A. Schmidt, M. Berzins
{"title":"Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede","authors":"Qingyu Meng, A. Humphrey, John A. Schmidt, M. Berzins","doi":"10.1145/2484762.2484779","DOIUrl":"https://doi.org/10.1145/2484762.2484779","url":null,"abstract":"In this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on structured adaptive grids. Uintah uses a combination of fluid-flow solvers and particle-based methods, together with a novel asynchronous task-based approach and fully automated load balancing. While we have designed scalable Uintah runtime systems for large CPU core counts, the emergence of heterogeneous systems presents considerable challenges in terms of effectively utilizing additional on-node accelerators and co-processors, deep memory hierarchies, as well as managing multiple levels of parallelism. Our recent work has addressed the emergence of heterogeneous CPU/GPU systems with the design of a Unified heterogeneous runtime system, enabling Uintah to fully exploit these architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. Using this design, Uintah has run at full scale on the Keeneland System and TitanDev. With the release of the Intel Xeon Phi co-processor and the recent availability of the Stampede system, we show that Uintah may be modified to utilize such a coprocessor based system. We also explore the different usage models provided by the Xeon Phi with the aim of understanding portability of a general purpose framework like Uintah to this architecture. These usage models range from the pragma based offload model to the more complex symmetric model, utilizing all co-processor and host CPU cores simultaneously. We provide preliminary results of the various usage models for a challenging adaptive mesh refinement problem, as well as a detailed account of our experience adapting Uintah to run on the Stampede system. Our conclusion is that while the Stampede system is easy to use, obtaining high performance from the Xeon Phi co-processors requires a substantial but different investment to that needed for GPU-based systems.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114697889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Koesterke, K. Milfeld, M. Vaughn, D. Stanzione, J. Koltes, N. Weeks, J. Reecy
{"title":"Optimizing the PCIT algorithm on stampede's Xeon and Xeon Phi processors for faster discovery of biological networks","authors":"L. Koesterke, K. Milfeld, M. Vaughn, D. Stanzione, J. Koltes, N. Weeks, J. Reecy","doi":"10.1145/2484762.2484794","DOIUrl":"https://doi.org/10.1145/2484762.2484794","url":null,"abstract":"The PCIT method is an important technique for detecting interactions between networks. The PCIT algorithm has been used in the biological context to infer complex regulatory mechanisms and interactions in genetic networks, in genome wide association studies, and in other similar problems. In this work, the PCIT algorithm is re-implemented with exemplary parallel, vector, I/O, memory and instruction optimizations for today's multi- and many-core architectures. The evolution and performance of the new code targets the processor architectures of the Stampede supercomputer, but will also benefit other architectures. The Stampede system consists of an Intel Xeon E5 processor base system with an innovative component comprised of Intel Xeon Phi Coprocessors. Optimized results and an analysis are presented for both the Xeon and the Xeon Phi.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114467364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"1000 words: advanced visualization for the humanities","authors":"Rob Turknett, Brandt M. Westing, Samuel Moore","doi":"10.1145/2484762.2484835","DOIUrl":"https://doi.org/10.1145/2484762.2484835","url":null,"abstract":"1000 Words is a project to enable discoveries at extreme scale in the Humanities. Funded by the National Endowment for the Humanities (NEH), this project aims to make advanced visualization systems attached to high performance computing resources both useful and usable for scholars in the arts and humanities. This paper describes Massive Pixel Environment (MPE), our initial effort toward this goal. Massive Pixel Environment is a software library developed at the Texas Advanced Computing Center (TACC) for extending Processing sketches to multi-node tiled displays. Processing is an open source programming language and environment for creating images, animations and interactions. MPE significantly lowers the learning curve and time needed to develop software and interactive visualizations for multi-node tiled displays. We will discuss the applications and implications of MPE for the sciences, humanities, and media arts.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122092077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Gordon, Jay Alameda, J. Demmel, R. Carbunescu, Susan Mehringer
{"title":"Providing a supported online course on parallel computing","authors":"S. Gordon, Jay Alameda, J. Demmel, R. Carbunescu, Susan Mehringer","doi":"10.1145/2484762.2484765","DOIUrl":"https://doi.org/10.1145/2484762.2484765","url":null,"abstract":"Learning the principles of computational modeling and parallel computing requires more than a short workshop. Workshops generally run from a few hours to a few days and are therefore limited in the amount of material that can be covered. In addition, it is more difficult for participants to retain large amounts of new material under the time pressures of a workshop. Deeper understanding of such complex materials can come from more traditional academic courses. Yet, many institutions either lack the expertise or the curriculum flexibility to offer such courses. In the spring of 2013 we offered the equivalent of a full semester course entitled Applications of Parallel Computing as an open, online course in an effort to address these issues. The course was offered over a period of thirteen weeks using materials captured from the University of California Berkeley course CS267. Enrollment was initially limited to 345 students. Creating and implementing the course involved decisions in several areas: design of the instructional materials, creating an environment to run programming assignments, support mechanisms for the large number of students taking the course, and automatic grading of assignments. In this session, we will present a summary of the experience in addressing these questions along with an evaluation of the course outcomes.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122686578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A tale of three outreach programs: strategic collaboration across XSEDE outreach services","authors":"L. Akli, R. Kravetz, R. Moye","doi":"10.1145/2484762.2484799","DOIUrl":"https://doi.org/10.1145/2484762.2484799","url":null,"abstract":"This is a tale of how three outreach programs with very different missions have enhanced the impact of the XSEDE Scholars Program through strong collaboration. The XSEDE Scholars Program (XSP) is a program for U.S. students from underrepresented groups in the area of computational sciences that provides opportunities to learn more about high performance computing and XSEDE resources and to network with cutting-edge researchers and professional leaders. The mission of the XSEDE MSI Outreach program is to expand the number of faculty from Minority Serving Institutions (MSIs) and underrepresented groups engaged in the use of XSEDE resources, HPC, and computational science and engineering. The XSEDE Campus Champions program strengthens the connection between campuses and XSEDE by supporting campus representatives as a local source of knowledge about high-performance and high-throughput computing and other digital services, opportunities and resources.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115927740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Furlani, Barry L. Schneider, Matthew D. Jones, John Towns, David L. Hart, S. Gallo, R. L. Deleon, Charng-Da Lu, Amin Ghadersohi, Ryan J. Gentner, A. Patra, G. Laszewski, Fugang Wang, Jeffrey T. Palmer, N. Simakov
{"title":"Using XDMoD to facilitate XSEDE operations, planning and analysis","authors":"T. Furlani, Barry L. Schneider, Matthew D. Jones, John Towns, David L. Hart, S. Gallo, R. L. Deleon, Charng-Da Lu, Amin Ghadersohi, Ryan J. Gentner, A. Patra, G. Laszewski, Fugang Wang, Jeffrey T. Palmer, N. Simakov","doi":"10.1145/2484762.2484763","DOIUrl":"https://doi.org/10.1145/2484762.2484763","url":null,"abstract":"The XDMoD auditing tool provides, for the first time, a comprehensive tool to measure both utilization and performance of high-end cyberinfrastructure (CI), with initial focus on XSEDE. Here, we demonstrate, through several case studies, its utility for providing important metrics regarding resource utilization and performance of TeraGrid/XSEDE that can be used for detailed analysis and planning as well as improving operational efficiency and performance. Measuring the utilization of high-end cyberinfrastructure such as XSEDE helps provide a detailed understanding of how a given CI resource is being utilized and can lead to improved performance of the resource in terms of job throughput or any number of desired job characteristics. In the case studies considered here, a detailed historical analysis of XSEDE usage data using XDMoD clearly demonstrates the tremendous growth in the number of users, overall usage, and scale of the simulations routinely carried out. Not surprisingly, physics, chemistry, and the engineering disciplines are shown to be heavy users of the resources. However, as the data clearly show, molecular biosciences are now a significant and growing user of XSEDE resources, accounting for more than 20 percent of all SUs consumed in 2012. XDMoD shows that the resources required by the various scientific disciplines are very different. Physics, Astronomical sciences, and Atmospheric sciences tend to solve large problems requiring many cores. Molecular biosciences applications on the other hand, require many cycles but do not employ core counts that are as large. Such distinctions are important in guiding future cyberinfrastructure design decisions. XDMoD's implementation of a novel application kernel-based auditing system to measure overall CI system performance and quality of service is shown, through several examples, to provide a useful means to automatically detect under performing hardware and software. This capability is especially critical given the complex composition of today's advanced CI. Examples include an application kernel based on a widely used quantum chemistry program that uncovered a software bug in the I/O stack of a commercial parallel file system, which was subsequently fixed by the vendor in the form of a software patch that is now part of their standard release. This error, which resulted in dramatically increased execution times as well as outright job failure, would likely have gone unnoticed for sometime and was only uncovered as a result of implementation of XDMoD's suite of application kernels.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121365732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FluMapper: an interactive CyberGIS environment for massive location-based social media data analysis","authors":"Anand Padmanabhan, Shaowen Wang, G. Cao, Myunghwa Hwang, Yanli Zhao, Zhenhua Zhang, Yizhao Gao","doi":"10.1145/2484762.2484821","DOIUrl":"https://doi.org/10.1145/2484762.2484821","url":null,"abstract":"Social media, such as social network (e.g., Facebook), microblogs (e.g. Twitter) have experienced a spectacular rise in popularity, and attracting hundreds of millions of users generating unprecedented amount of information. Twitter, for example, has rapidly gained approximately 500 million registered users as of 2012, generating 340 million tweets daily. Although each tweet is limited to only 140 characters, the aggregate of millions of tweets may provide a realistic representation of landscapes for a certain topic of interest. Furthermore, with widespread use of location aware mobile devices, users are sharing their whereabouts through social media services. This has resulted in a dramatic increase in volume of spatial data and they are becoming a crucial attribute of social media. These location-based social media thus could provide valuable insights to understanding many geographic phenomena. Recent studies capitalizing on social networking and media data show significant societal impacts, in many areas including infectious disease tracking [1].","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124115309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting MapReduce and data compression for data-intensive applications","authors":"Guangchen Ruan, Hui Zhang, Beth Plale","doi":"10.1145/2484762.2484785","DOIUrl":"https://doi.org/10.1145/2484762.2484785","url":null,"abstract":"HPC platform shows good success for predominantly compute-intensive jobs, however, data intensive jobs still struggle on HPC platform as large amounts of concurrent data movement from I/O nodes to compute nodes can easily saturate the network links. MapReduce, the \"moving computation to data\" paradigm for many pleasingly parallel applications, assumes that data are resident on local disks and computation is scheduled where the data are located. However, on an HPC machine data must be staged from a broader file system (such as Luster), to HDFS where it can be accessed; this staging can represent a substantial delay in processing. In this paper we look at data compression's effect on reducing bandwidth needs of getting data to the application, as well as its impact on the overall performance of data-intensive applications. Our study examines two types of applications, a 3D-time series caries lesion assessment focusing on large scale medical image dataset, and a HTRC word counting task concerning large scale text analysis running on XSEDE resources. Our extensive experimental results demonstrate significant performance improvement in terms of storage space, data stage-in time, and job execution time.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129190144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}