{"title":"预测大气相关有机化合物纯组分表面张力的机器学习方法。","authors":"Ryan Schmedding, Mees Franssen, Andreas Zuend","doi":"10.1021/acsestair.4c00291","DOIUrl":null,"url":null,"abstract":"<p><p>Atmospheric aerosols are complex mixtures of highly functionalized organic compounds, water, inorganic electrolytes, metals, and carbonaceous species. The surface properties of atmospheric aerosol particles can influence several of their chemical and physical impacts, including their hygroscopic growth, aerosol-cloud interactions, and heterogeneous chemical reactions. The effects of the various compounds within a particle on its surface tension depend in part on the pure-component surface tensions. For many of the myriad of organic compounds of interest, experimental pure-component surface tension data at tropospheric temperatures are lacking, thus, requiring the development and application of property estimation methods. In this work, a compiled database of experimental pure-component surface tension data, covering a wide range of organic compound classes and temperatures, is used to train four different types of machine learning models to predict the temperature-dependent pure-component surface tensions of atmospherically relevant organic compounds. The trained models process input information about the temperature and the molecular structure of an organic compound, initially in the form of a Simplified Molecular Input Line Entry System (SMILES) string, to enable predictions. Our quantitative model assessment shows that extreme gradient-boosted descent along with Molecular ACCess System (MACCS) key descriptors of molecular structure provided the best balance of derived input complexity and model performance, resulting in a root-mean-square error (RMSE) of ∼1 mJ m<sup>-2</sup> in pure-component surface tension. Additionally, a simplified model based on molar mass, elemental ratios, and temperature as inputs was developed for use in applications for which molecular structure information is incomplete (RMSE of ∼2 mJ m<sup>-2</sup>). We demonstrate that including predicted pure-component surface tension values in thermodynamically rigorous bulk-surface partitioning calculations may substantially modify the critical supersaturations necessary for aerosol activation into cloud droplets.</p>","PeriodicalId":100014,"journal":{"name":"ACS ES&T Air","volume":"2 5","pages":"808-823"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12071373/pdf/","citationCount":"0","resultStr":"{\"title\":\"A Machine Learning Approach for Predicting the Pure-Component Surface Tension of Atmospherically Relevant Organic Compounds.\",\"authors\":\"Ryan Schmedding, Mees Franssen, Andreas Zuend\",\"doi\":\"10.1021/acsestair.4c00291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Atmospheric aerosols are complex mixtures of highly functionalized organic compounds, water, inorganic electrolytes, metals, and carbonaceous species. The surface properties of atmospheric aerosol particles can influence several of their chemical and physical impacts, including their hygroscopic growth, aerosol-cloud interactions, and heterogeneous chemical reactions. The effects of the various compounds within a particle on its surface tension depend in part on the pure-component surface tensions. For many of the myriad of organic compounds of interest, experimental pure-component surface tension data at tropospheric temperatures are lacking, thus, requiring the development and application of property estimation methods. In this work, a compiled database of experimental pure-component surface tension data, covering a wide range of organic compound classes and temperatures, is used to train four different types of machine learning models to predict the temperature-dependent pure-component surface tensions of atmospherically relevant organic compounds. The trained models process input information about the temperature and the molecular structure of an organic compound, initially in the form of a Simplified Molecular Input Line Entry System (SMILES) string, to enable predictions. Our quantitative model assessment shows that extreme gradient-boosted descent along with Molecular ACCess System (MACCS) key descriptors of molecular structure provided the best balance of derived input complexity and model performance, resulting in a root-mean-square error (RMSE) of ∼1 mJ m<sup>-2</sup> in pure-component surface tension. Additionally, a simplified model based on molar mass, elemental ratios, and temperature as inputs was developed for use in applications for which molecular structure information is incomplete (RMSE of ∼2 mJ m<sup>-2</sup>). We demonstrate that including predicted pure-component surface tension values in thermodynamically rigorous bulk-surface partitioning calculations may substantially modify the critical supersaturations necessary for aerosol activation into cloud droplets.</p>\",\"PeriodicalId\":100014,\"journal\":{\"name\":\"ACS ES&T Air\",\"volume\":\"2 5\",\"pages\":\"808-823\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12071373/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS ES&T Air\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1021/acsestair.4c00291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/9 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS ES&T Air","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1021/acsestair.4c00291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/9 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
A Machine Learning Approach for Predicting the Pure-Component Surface Tension of Atmospherically Relevant Organic Compounds.
Atmospheric aerosols are complex mixtures of highly functionalized organic compounds, water, inorganic electrolytes, metals, and carbonaceous species. The surface properties of atmospheric aerosol particles can influence several of their chemical and physical impacts, including their hygroscopic growth, aerosol-cloud interactions, and heterogeneous chemical reactions. The effects of the various compounds within a particle on its surface tension depend in part on the pure-component surface tensions. For many of the myriad of organic compounds of interest, experimental pure-component surface tension data at tropospheric temperatures are lacking, thus, requiring the development and application of property estimation methods. In this work, a compiled database of experimental pure-component surface tension data, covering a wide range of organic compound classes and temperatures, is used to train four different types of machine learning models to predict the temperature-dependent pure-component surface tensions of atmospherically relevant organic compounds. The trained models process input information about the temperature and the molecular structure of an organic compound, initially in the form of a Simplified Molecular Input Line Entry System (SMILES) string, to enable predictions. Our quantitative model assessment shows that extreme gradient-boosted descent along with Molecular ACCess System (MACCS) key descriptors of molecular structure provided the best balance of derived input complexity and model performance, resulting in a root-mean-square error (RMSE) of ∼1 mJ m-2 in pure-component surface tension. Additionally, a simplified model based on molar mass, elemental ratios, and temperature as inputs was developed for use in applications for which molecular structure information is incomplete (RMSE of ∼2 mJ m-2). We demonstrate that including predicted pure-component surface tension values in thermodynamically rigorous bulk-surface partitioning calculations may substantially modify the critical supersaturations necessary for aerosol activation into cloud droplets.