X. Jia, Yiqun Xie, Sheng Li, Shengyu Chen, J. Zwart, J. Sadler, A. Appling, S. Oliver, J. Read
{"title":"基于模拟数据的物理引导机器学习:在湖泊和河流系统建模中的应用","authors":"X. Jia, Yiqun Xie, Sheng Li, Shengyu Chen, J. Zwart, J. Sadler, A. Appling, S. Oliver, J. Read","doi":"10.1109/ICDM51629.2021.00037","DOIUrl":null,"url":null,"abstract":"This paper proposes a new physics-guided machine learning approach that incorporates the scientific knowledge in physics-based models into machine learning models. Physics-based models are widely used to study dynamical systems in a variety of scientific and engineering problems. Although they are built based on general physical laws that govern the relations from input to output variables, these models often produce biased simulations due to inaccurate parameterizations or approximations used to represent the true physics. In this paper, we aim to build a new data-driven framework to monitor dynamical systems by extracting general scientific knowledge embodied in simulation data generated by the physics-based model. To handle the bias in simulation data caused by imperfect parameterization, we propose to extract general physical relations jointly from multiple sets of simulations generated by a physics-based model under different physical parameters. In particular, we develop a spatio-temporal network architecture that uses its gating variables to capture the variation of physical parameters. We initialize this model using a pre-training strategy that helps discover common physical patterns shared by different sets of simulation data. Then we fine-tune it using limited observation data via a contrastive learning process. By leveraging the complementary strength of machine learning and domain knowledge, our method has been shown to produce accurate predictions, use less training samples and generalize to out-of-sample scenarios. We further show that the method can provide insights about the variation of physical parameters over space and time in two domain applications: predicting temperature in streams and predicting temperature in lakes.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Physics-Guided Machine Learning from Simulation Data: An Application in Modeling Lake and River Systems\",\"authors\":\"X. Jia, Yiqun Xie, Sheng Li, Shengyu Chen, J. Zwart, J. Sadler, A. Appling, S. Oliver, J. Read\",\"doi\":\"10.1109/ICDM51629.2021.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a new physics-guided machine learning approach that incorporates the scientific knowledge in physics-based models into machine learning models. Physics-based models are widely used to study dynamical systems in a variety of scientific and engineering problems. Although they are built based on general physical laws that govern the relations from input to output variables, these models often produce biased simulations due to inaccurate parameterizations or approximations used to represent the true physics. In this paper, we aim to build a new data-driven framework to monitor dynamical systems by extracting general scientific knowledge embodied in simulation data generated by the physics-based model. To handle the bias in simulation data caused by imperfect parameterization, we propose to extract general physical relations jointly from multiple sets of simulations generated by a physics-based model under different physical parameters. In particular, we develop a spatio-temporal network architecture that uses its gating variables to capture the variation of physical parameters. We initialize this model using a pre-training strategy that helps discover common physical patterns shared by different sets of simulation data. Then we fine-tune it using limited observation data via a contrastive learning process. By leveraging the complementary strength of machine learning and domain knowledge, our method has been shown to produce accurate predictions, use less training samples and generalize to out-of-sample scenarios. We further show that the method can provide insights about the variation of physical parameters over space and time in two domain applications: predicting temperature in streams and predicting temperature in lakes.\",\"PeriodicalId\":320970,\"journal\":{\"name\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM51629.2021.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Physics-Guided Machine Learning from Simulation Data: An Application in Modeling Lake and River Systems
This paper proposes a new physics-guided machine learning approach that incorporates the scientific knowledge in physics-based models into machine learning models. Physics-based models are widely used to study dynamical systems in a variety of scientific and engineering problems. Although they are built based on general physical laws that govern the relations from input to output variables, these models often produce biased simulations due to inaccurate parameterizations or approximations used to represent the true physics. In this paper, we aim to build a new data-driven framework to monitor dynamical systems by extracting general scientific knowledge embodied in simulation data generated by the physics-based model. To handle the bias in simulation data caused by imperfect parameterization, we propose to extract general physical relations jointly from multiple sets of simulations generated by a physics-based model under different physical parameters. In particular, we develop a spatio-temporal network architecture that uses its gating variables to capture the variation of physical parameters. We initialize this model using a pre-training strategy that helps discover common physical patterns shared by different sets of simulation data. Then we fine-tune it using limited observation data via a contrastive learning process. By leveraging the complementary strength of machine learning and domain knowledge, our method has been shown to produce accurate predictions, use less training samples and generalize to out-of-sample scenarios. We further show that the method can provide insights about the variation of physical parameters over space and time in two domain applications: predicting temperature in streams and predicting temperature in lakes.