Nicolas Hayer, Thomas Specht, Justus Arweiler, Hans Hasse and Fabian Jirasek*,
{"title":"Similarity-Informed Matrix Completion Method for Predicting Activity Coefficients","authors":"Nicolas Hayer, Thomas Specht, Justus Arweiler, Hans Hasse and Fabian Jirasek*, ","doi":"10.1021/acs.jpca.4c0836010.1021/acs.jpca.4c08360","DOIUrl":null,"url":null,"abstract":"<p >Accurate prediction of thermodynamic properties of mixtures, such as activity coefficients, is essential for designing and optimizing chemical processes. While established physics-based methods face limitations in prediction accuracy and scope, emerging machine learning approaches, such as matrix completion methods (MCMs), offer promising alternatives. However, their performance can suffer in data-sparse regions. To address this issue, we propose a novel hybrid MCM for predicting activity coefficients at infinite dilution at 298 K that not only uses experimental training data but also includes synthetic training data from two sources: predictions obtained from the physics-based modified UNIFAC (Dortmund) and from a similarity-based approach developed in previous work. The resulting hybrid method combines the broad applicability of MCMs with the precision of the similarity-based approach, resulting in a more robust prediction framework that excels even in regions with limited data. Additionally, our analysis provides valuable insights into how different types of training data affect the prediction accuracy. When experimental data are sparse, incorporating synthetic training data from modified UNIFAC (Dortmund) and the similarity-based approach significantly improves the performance of the MCMs. Conversely, even with abundant experimental data, high accuracy is achieved only if the training set includes mixtures similar to those of interest.</p>","PeriodicalId":59,"journal":{"name":"The Journal of Physical Chemistry A","volume":"129 13","pages":"3141–3147 3141–3147"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry A","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jpca.4c08360","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate prediction of thermodynamic properties of mixtures, such as activity coefficients, is essential for designing and optimizing chemical processes. While established physics-based methods face limitations in prediction accuracy and scope, emerging machine learning approaches, such as matrix completion methods (MCMs), offer promising alternatives. However, their performance can suffer in data-sparse regions. To address this issue, we propose a novel hybrid MCM for predicting activity coefficients at infinite dilution at 298 K that not only uses experimental training data but also includes synthetic training data from two sources: predictions obtained from the physics-based modified UNIFAC (Dortmund) and from a similarity-based approach developed in previous work. The resulting hybrid method combines the broad applicability of MCMs with the precision of the similarity-based approach, resulting in a more robust prediction framework that excels even in regions with limited data. Additionally, our analysis provides valuable insights into how different types of training data affect the prediction accuracy. When experimental data are sparse, incorporating synthetic training data from modified UNIFAC (Dortmund) and the similarity-based approach significantly improves the performance of the MCMs. Conversely, even with abundant experimental data, high accuracy is achieved only if the training set includes mixtures similar to those of interest.
期刊介绍:
The Journal of Physical Chemistry A is devoted to reporting new and original experimental and theoretical basic research of interest to physical chemists, biophysical chemists, and chemical physicists.