{"title":"利用交互式决策树将科学知识整合到机器学习中","authors":"Thorsten Wagener, F. Pianosi","doi":"10.31223/x5pp75","DOIUrl":null,"url":null,"abstract":"Decision Trees (DT) is a machine learning method that has been widely used in the environmental sciences to automatically extract patterns from complex and high dimensional data. However, like any data-based method, is hindered by data limitations and potentially physically unrealistic results. We develop interactive DT (iDT) that put the human in the loop and integrate the power of experts’ scientific knowledge with the power of the algorithms to automatically learn patterns from large data. We created a toolbox that contains methods and visualization techniques that allow users to interact with the DT. Users can create new composite variables, manually change the variable and threshold to split, manually prune and group variables based on physical meaning. We demonstrate with three case studies that iDT help experts incorporate their knowledge in the DT models achieving higher interpretability and realism in a physical sense.","PeriodicalId":10649,"journal":{"name":"Comput. Geosci.","volume":"24 1","pages":"105248"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Integrating scientific knowledge into machine learning using interactive decision trees\",\"authors\":\"Thorsten Wagener, F. Pianosi\",\"doi\":\"10.31223/x5pp75\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Decision Trees (DT) is a machine learning method that has been widely used in the environmental sciences to automatically extract patterns from complex and high dimensional data. However, like any data-based method, is hindered by data limitations and potentially physically unrealistic results. We develop interactive DT (iDT) that put the human in the loop and integrate the power of experts’ scientific knowledge with the power of the algorithms to automatically learn patterns from large data. We created a toolbox that contains methods and visualization techniques that allow users to interact with the DT. Users can create new composite variables, manually change the variable and threshold to split, manually prune and group variables based on physical meaning. We demonstrate with three case studies that iDT help experts incorporate their knowledge in the DT models achieving higher interpretability and realism in a physical sense.\",\"PeriodicalId\":10649,\"journal\":{\"name\":\"Comput. Geosci.\",\"volume\":\"24 1\",\"pages\":\"105248\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Geosci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31223/x5pp75\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Geosci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31223/x5pp75","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrating scientific knowledge into machine learning using interactive decision trees
Decision Trees (DT) is a machine learning method that has been widely used in the environmental sciences to automatically extract patterns from complex and high dimensional data. However, like any data-based method, is hindered by data limitations and potentially physically unrealistic results. We develop interactive DT (iDT) that put the human in the loop and integrate the power of experts’ scientific knowledge with the power of the algorithms to automatically learn patterns from large data. We created a toolbox that contains methods and visualization techniques that allow users to interact with the DT. Users can create new composite variables, manually change the variable and threshold to split, manually prune and group variables based on physical meaning. We demonstrate with three case studies that iDT help experts incorporate their knowledge in the DT models achieving higher interpretability and realism in a physical sense.