Andrea Murari, Riccardo Rossi, Luca Spolladore, Ivan Wyss, Michela Gelfusa
{"title":"Informed machine learning to reconcile interpretability with fidelity in scientific applications","authors":"Andrea Murari, Riccardo Rossi, Luca Spolladore, Ivan Wyss, Michela Gelfusa","doi":"10.1007/s10462-025-11282-y","DOIUrl":null,"url":null,"abstract":"<div><p>Notwithstanding their impressive performances, unfortunately some of the most powerful machine learning (ML) models are obscure and almost impossible to interpret. Consequently, in the last years, there has been a rapid increase in research about eXplainable Artificial Intelligence, whose objective consists of improving their transparency. In scientific applications, explainability assumes a different flavour and cannot be reduced to pure user understanding but there is a premium also on <i>fidelity</i>, on developing models that reflect the actual mechanisms at play in the investigated phenomena. To this end, Genetic Programming supported Symbolic Regression (GPSR), conceived explicitly to manipulate symbols, can present various competitive advantages in finding a good trade-off between interpretability and realism. However, the search spaces are typically too large and the algorithms have to be steered to converge on the desired solutions. The present work describes techniques to constrain GPSR and to combine it with deep learning tools, so that the final models are expressed in terms of interpretable and realistic mathematical equations. The strategies to guide convergence include dimensional analysis, integration of prior information about symmetries and conservation laws, refinements of the fitness function and robust statistics. The performances are improved according to all the main metrics: accuracy, robustness against noise and outliers, capability of handling data sparsity and interpretability. Great attention has been paid to introducing practical solutions, covering most essential aspects of the data analysis process, from the treatment of the uncertainties to the quantification of the equations’ complexity. All the main applications of supervised ML, from regression to classification, are considered (and the extension to unsupervised and reinforcement learning are not expected to pose major difficulties). Theoretical considerations, systematic numerical tests, simulations with multiphysics codes and the results of actual experiments prove the potential of the proposed improvements.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 10","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11282-y.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11282-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Notwithstanding their impressive performances, unfortunately some of the most powerful machine learning (ML) models are obscure and almost impossible to interpret. Consequently, in the last years, there has been a rapid increase in research about eXplainable Artificial Intelligence, whose objective consists of improving their transparency. In scientific applications, explainability assumes a different flavour and cannot be reduced to pure user understanding but there is a premium also on fidelity, on developing models that reflect the actual mechanisms at play in the investigated phenomena. To this end, Genetic Programming supported Symbolic Regression (GPSR), conceived explicitly to manipulate symbols, can present various competitive advantages in finding a good trade-off between interpretability and realism. However, the search spaces are typically too large and the algorithms have to be steered to converge on the desired solutions. The present work describes techniques to constrain GPSR and to combine it with deep learning tools, so that the final models are expressed in terms of interpretable and realistic mathematical equations. The strategies to guide convergence include dimensional analysis, integration of prior information about symmetries and conservation laws, refinements of the fitness function and robust statistics. The performances are improved according to all the main metrics: accuracy, robustness against noise and outliers, capability of handling data sparsity and interpretability. Great attention has been paid to introducing practical solutions, covering most essential aspects of the data analysis process, from the treatment of the uncertainties to the quantification of the equations’ complexity. All the main applications of supervised ML, from regression to classification, are considered (and the extension to unsupervised and reinforcement learning are not expected to pose major difficulties). Theoretical considerations, systematic numerical tests, simulations with multiphysics codes and the results of actual experiments prove the potential of the proposed improvements.
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.