Ryan Schmedding, Mees Franssen and Andreas Zuend*,
{"title":"A Machine Learning Approach for Predicting the Pure-Component Surface Tension of Atmospherically Relevant Organic Compounds","authors":"Ryan Schmedding, Mees Franssen and Andreas Zuend*, ","doi":"10.1021/acsestair.4c0029110.1021/acsestair.4c00291","DOIUrl":null,"url":null,"abstract":"<p >Atmospheric aerosols are complex mixtures of highly functionalized organic compounds, water, inorganic electrolytes, metals, and carbonaceous species. The surface properties of atmospheric aerosol particles can influence several of their chemical and physical impacts, including their hygroscopic growth, aerosol–cloud interactions, and heterogeneous chemical reactions. The effects of the various compounds within a particle on its surface tension depend in part on the pure-component surface tensions. For many of the myriad of organic compounds of interest, experimental pure-component surface tension data at tropospheric temperatures are lacking, thus, requiring the development and application of property estimation methods. In this work, a compiled database of experimental pure-component surface tension data, covering a wide range of organic compound classes and temperatures, is used to train four different types of machine learning models to predict the temperature-dependent pure-component surface tensions of atmospherically relevant organic compounds. The trained models process input information about the temperature and the molecular structure of an organic compound, initially in the form of a Simplified Molecular Input Line Entry System (SMILES) string, to enable predictions. Our quantitative model assessment shows that extreme gradient-boosted descent along with Molecular ACCess System (MACCS) key descriptors of molecular structure provided the best balance of derived input complexity and model performance, resulting in a root-mean-square error (RMSE) of ∼1 mJ m<sup>–2</sup> in pure-component surface tension. Additionally, a simplified model based on molar mass, elemental ratios, and temperature as inputs was developed for use in applications for which molecular structure information is incomplete (RMSE of ∼2 mJ m<sup>–2</sup>). We demonstrate that including predicted pure-component surface tension values in thermodynamically rigorous bulk–surface partitioning calculations may substantially modify the critical supersaturations necessary for aerosol activation into cloud droplets.</p>","PeriodicalId":100014,"journal":{"name":"ACS ES&T Air","volume":"2 5","pages":"808–823 808–823"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS ES&T Air","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsestair.4c00291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Atmospheric aerosols are complex mixtures of highly functionalized organic compounds, water, inorganic electrolytes, metals, and carbonaceous species. The surface properties of atmospheric aerosol particles can influence several of their chemical and physical impacts, including their hygroscopic growth, aerosol–cloud interactions, and heterogeneous chemical reactions. The effects of the various compounds within a particle on its surface tension depend in part on the pure-component surface tensions. For many of the myriad of organic compounds of interest, experimental pure-component surface tension data at tropospheric temperatures are lacking, thus, requiring the development and application of property estimation methods. In this work, a compiled database of experimental pure-component surface tension data, covering a wide range of organic compound classes and temperatures, is used to train four different types of machine learning models to predict the temperature-dependent pure-component surface tensions of atmospherically relevant organic compounds. The trained models process input information about the temperature and the molecular structure of an organic compound, initially in the form of a Simplified Molecular Input Line Entry System (SMILES) string, to enable predictions. Our quantitative model assessment shows that extreme gradient-boosted descent along with Molecular ACCess System (MACCS) key descriptors of molecular structure provided the best balance of derived input complexity and model performance, resulting in a root-mean-square error (RMSE) of ∼1 mJ m–2 in pure-component surface tension. Additionally, a simplified model based on molar mass, elemental ratios, and temperature as inputs was developed for use in applications for which molecular structure information is incomplete (RMSE of ∼2 mJ m–2). We demonstrate that including predicted pure-component surface tension values in thermodynamically rigorous bulk–surface partitioning calculations may substantially modify the critical supersaturations necessary for aerosol activation into cloud droplets.