{"title":"smile驱动的机器学习用于防腐材料的高通量研究","authors":"Muhamad Akrom , Harun Al Azies , Wise Herowati , Totok Sutojo , Supriadi Rustad , Hermawan Kresno Dipojono , Hideaki Kasai","doi":"10.1016/j.chemolab.2025.105441","DOIUrl":null,"url":null,"abstract":"<div><div>This investigation delves into the viability of the simplified molecular input line entry system (SMILES)-based machine learning (ML) approach as the sole input feature for predicting the corrosion inhibition efficiency (CIE) of pyridine-quinoline compounds to replace various quantum chemical properties (QCP). Employing the molecular access system (MACCS) fingerprint techniques simplifies the processing of molecular structures, enhancing data efficiency. The ML algorithm, notably the gradient boosting (GB) model, showcases superior predictive capabilities, as evidenced by R<sup>2</sup> and RMSE values of 0.92 and 0.07, respectively. This outcome is akin to predictions employing 20 QCP features, yielding R<sup>2</sup> and RMSE values of 0.90 and 0.08, respectively. The study substantiates SMILES as a robust single feature for accurate CIE prediction, revealing a moderate correlation between SMILES-represented structures and CIE values. This underscores the effectiveness of SMILES-based ML in assessing corrosion inhibition potential, thereby advancing predictive modeling in corrosion science. Integrating machine learning and SMILES notation presents an efficient approach for evaluating the corrosion inhibition capacity of diverse molecular structures.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"263 ","pages":"Article 105441"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SMILES-driven machine learning for high-throughput investigation of anti-corrosion materials\",\"authors\":\"Muhamad Akrom , Harun Al Azies , Wise Herowati , Totok Sutojo , Supriadi Rustad , Hermawan Kresno Dipojono , Hideaki Kasai\",\"doi\":\"10.1016/j.chemolab.2025.105441\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This investigation delves into the viability of the simplified molecular input line entry system (SMILES)-based machine learning (ML) approach as the sole input feature for predicting the corrosion inhibition efficiency (CIE) of pyridine-quinoline compounds to replace various quantum chemical properties (QCP). Employing the molecular access system (MACCS) fingerprint techniques simplifies the processing of molecular structures, enhancing data efficiency. The ML algorithm, notably the gradient boosting (GB) model, showcases superior predictive capabilities, as evidenced by R<sup>2</sup> and RMSE values of 0.92 and 0.07, respectively. This outcome is akin to predictions employing 20 QCP features, yielding R<sup>2</sup> and RMSE values of 0.90 and 0.08, respectively. The study substantiates SMILES as a robust single feature for accurate CIE prediction, revealing a moderate correlation between SMILES-represented structures and CIE values. This underscores the effectiveness of SMILES-based ML in assessing corrosion inhibition potential, thereby advancing predictive modeling in corrosion science. Integrating machine learning and SMILES notation presents an efficient approach for evaluating the corrosion inhibition capacity of diverse molecular structures.</div></div>\",\"PeriodicalId\":9774,\"journal\":{\"name\":\"Chemometrics and Intelligent Laboratory Systems\",\"volume\":\"263 \",\"pages\":\"Article 105441\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemometrics and Intelligent Laboratory Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169743925001261\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925001261","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
SMILES-driven machine learning for high-throughput investigation of anti-corrosion materials
This investigation delves into the viability of the simplified molecular input line entry system (SMILES)-based machine learning (ML) approach as the sole input feature for predicting the corrosion inhibition efficiency (CIE) of pyridine-quinoline compounds to replace various quantum chemical properties (QCP). Employing the molecular access system (MACCS) fingerprint techniques simplifies the processing of molecular structures, enhancing data efficiency. The ML algorithm, notably the gradient boosting (GB) model, showcases superior predictive capabilities, as evidenced by R2 and RMSE values of 0.92 and 0.07, respectively. This outcome is akin to predictions employing 20 QCP features, yielding R2 and RMSE values of 0.90 and 0.08, respectively. The study substantiates SMILES as a robust single feature for accurate CIE prediction, revealing a moderate correlation between SMILES-represented structures and CIE values. This underscores the effectiveness of SMILES-based ML in assessing corrosion inhibition potential, thereby advancing predictive modeling in corrosion science. Integrating machine learning and SMILES notation presents an efficient approach for evaluating the corrosion inhibition capacity of diverse molecular structures.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.