Dorra Ben Khalifa, Xinyi Li, I. Laguna, M. Martel, G. Gopalakrishnan
{"title":"Toward Increasing Trust in Exascale Simulations","authors":"Dorra Ben Khalifa, Xinyi Li, I. Laguna, M. Martel, G. Gopalakrishnan","doi":"10.1109/XLOOP56614.2022.00010","DOIUrl":"https://doi.org/10.1109/XLOOP56614.2022.00010","url":null,"abstract":"In recent decades, High Performance Computing (HPC) and simulations have become determinant in many areas of engineering and science. Since many HPC applications rely extensively on floating-point arithmetic operations to solve computational problems, many kinds of numerical errors can be introduced during the program execution, leading to instability or reproducibility problems. One kind of these error sources is the loss of significant digits or cancellation which produces inaccurate results when two nearby numbers are subtracted. In this article, we present Candy, a new dynamic library based on code instrumentation that detects cancellations in numerical software. The originality of our method is to compute the number of significant bits of floating-point numbers in a generalized framework by attaching a shadow value in higher precision to each number. This helps to detect in an accurate way if a program suffers from cancellation problems and thus to increase the trust in large-scale HPC applications and exascale simulations. We evaluate Candy over a set of complex and real-world numerical applications. In addition, we compare our method against the state-of-art tool FPChecker in terms of efficiency, mixed precision results and speed of the analysis.","PeriodicalId":401106,"journal":{"name":"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126984448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwei Chen, Sathya R. Chitturi, Rajan Plumley, L. Shen, Nathan C. Drucker, N. Burdet, Cheng Peng, Sougata Mardanya, D. Ratner, A. Mishra, C. Yoon, Sanghoon Song, M. Chollet, G. Fabbris, Mike Dunne, S. Nelson, Mingda Li, A. Lindenberg, Chunjing Jia, Y. Nashed, A. Bansil, Sugata Chowdhury, A. Feiguin, J. Turner, Jana Thayer
{"title":"Testing the data framework for an AI algorithm in preparation for high data rate X-ray facilities","authors":"Hongwei Chen, Sathya R. Chitturi, Rajan Plumley, L. Shen, Nathan C. Drucker, N. Burdet, Cheng Peng, Sougata Mardanya, D. Ratner, A. Mishra, C. Yoon, Sanghoon Song, M. Chollet, G. Fabbris, Mike Dunne, S. Nelson, Mingda Li, A. Lindenberg, Chunjing Jia, Y. Nashed, A. Bansil, Sugata Chowdhury, A. Feiguin, J. Turner, Jana Thayer","doi":"10.1109/XLOOP56614.2022.00006","DOIUrl":"https://doi.org/10.1109/XLOOP56614.2022.00006","url":null,"abstract":"The advent of next-generation X-ray free electron lasers will be capable of delivering X-rays at a repetition rate approaching 1 MHz continuously. This will require the development of data systems to handle experiments at these type of facilities, especially for high throughput applications, such as femtosecond X-ray crystallography and X-ray photon fluctuation spectroscopy.Here, we demonstrate a framework which captures single shot X-ray data at the LCLS and implements a machine-learning algorithm to automatically extract the contrast parameter from the collected data. We measure the time required to return the results and assess the feasibility of using this framework at high data volume. We use this experiment to determine the feasibility of solutions for ‘live’ data analysis at the MHz repetition rate.","PeriodicalId":401106,"journal":{"name":"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)","volume":"696 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133398731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Würthwein, J. Guiang, A. Arora, Diego Davila, John Graham, D. Mishin, Tom Hutton, I. Sfiligoi, Harvey Newman, J. Balcas, T. Lehman, Xi Yang, C. Guok
{"title":"Managed Network Services for Exascale Data Movement Across Large Global Scientific Collaborations","authors":"F. Würthwein, J. Guiang, A. Arora, Diego Davila, John Graham, D. Mishin, Tom Hutton, I. Sfiligoi, Harvey Newman, J. Balcas, T. Lehman, Xi Yang, C. Guok","doi":"10.1109/XLOOP56614.2022.00008","DOIUrl":"https://doi.org/10.1109/XLOOP56614.2022.00008","url":null,"abstract":"Unique scientific instruments designed and operated by large global collaborations are expected to produce Exabytescale data volumes per year by 2030. These collaborations depend on globally distributed storage and compute to turn raw data into science. While all of these infrastructures have batch scheduling capabilities to share compute, Research and Education networks lack those capabilities. There is thus uncontrolled competition for bandwidth between and within collaborations. As a result, data “hogs” disk space at processing facilities for much longer than it takes to process, leading to vastly over-provisioned storage infrastructures. Integrated co-scheduling of networks as part of high-level managed workflows might reduce these storage needs by more than an order of magnitude. This paper describes such a solution, demonstrates its functionality in the context of the Large Hadron Collider (LHC) at CERN, and presents the nextsteps towards its use in production.","PeriodicalId":401106,"journal":{"name":"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)","volume":"99 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133391975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuowen Zhao, Tanny Chavez, Elizabeth Holman, Guanhua Hao, A. Green, Harinarayan Krishnan, Dylan McReynolds, R. Pandolfi, Eric J. Roberts, Petrus H. Zwart, Howard Yanxon, N. Schwarz, S. Sankaranarayanan, S. Kalinin, Apurva Mehta, Stuart Campbell, A. Hexemer
{"title":"MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies","authors":"Zhuowen Zhao, Tanny Chavez, Elizabeth Holman, Guanhua Hao, A. Green, Harinarayan Krishnan, Dylan McReynolds, R. Pandolfi, Eric J. Roberts, Petrus H. Zwart, Howard Yanxon, N. Schwarz, S. Sankaranarayanan, S. Kalinin, Apurva Mehta, Stuart Campbell, A. Hexemer","doi":"10.1109/XLOOP56614.2022.00007","DOIUrl":"https://doi.org/10.1109/XLOOP56614.2022.00007","url":null,"abstract":"Machine learning (ML) algorithms are showing a growing trend in helping the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a personal device (laptop, smart phone, etc.) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios–-users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.","PeriodicalId":401106,"journal":{"name":"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125038195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maureen Dougherty, M. Zink, J. V. Oehsen, Kenneth Dalenberg, Bala Desinghu, J. Kaelber, Jeremy Schafer, J. Goodhue, Wolf Hey, Morgan Ludgwig, Boyd Wilson, Coleman McKnight
{"title":"The ERN Cryo-EM Federated Instrument Pilot Project: Phase 1","authors":"Maureen Dougherty, M. Zink, J. V. Oehsen, Kenneth Dalenberg, Bala Desinghu, J. Kaelber, Jeremy Schafer, J. Goodhue, Wolf Hey, Morgan Ludgwig, Boyd Wilson, Coleman McKnight","doi":"10.1109/XLOOP56614.2022.00009","DOIUrl":"https://doi.org/10.1109/XLOOP56614.2022.00009","url":null,"abstract":"The Ecosystem for Research Networking (ERN) CryoEM Remote Instrument Pilot Project was launched in response to feedback and survey data collected from hundreds of participants of the ERN series of NSF (OAC-2018927) funded community outreach events revealing that Structural Biology instrument driven science is being forced to transition from self-contained islands to federated wide-area internet accessible instruments. Its goal is to facilitate multi-institutional collaboration at the interface of computing and electron microscopy through the implementation of the ERN Federated OpenCI Lab’s Instrument CI Cloudlet design. The conclusion will be a web-based portal leveraging federated access to the instrument, workflows utilizing edge computing in conjunction with cloud computing and real-time monitoring for experimental parameter adjustments and decisions. The intention is to foster team science and scientific innovation, with emphasis on under-represented and under-resourced institutions, through the democratization of these scientific instruments. This paper discusses the latest Phase 1 deployment efforts","PeriodicalId":401106,"journal":{"name":"2022 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125085887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}