Christopher D. Stubbs, Yeonjoon Kim, Ethan C. Quinn, Raúl Pérez-Soto, Eugene Y.-X. Chen and Seonah Kim
{"title":"Predicting homopolymer and copolymer solubility through machine learning†","authors":"Christopher D. Stubbs, Yeonjoon Kim, Ethan C. Quinn, Raúl Pérez-Soto, Eugene Y.-X. Chen and Seonah Kim","doi":"10.1039/D4DD00290C","DOIUrl":null,"url":null,"abstract":"<p >Polymer solubility has applications in many important and diverse fields, including microprocessor fabrication, environmental conservation, paint formulation, and drug delivery, but it remains under-explored compared to its relative importance. This can be seen in the relative scarcity of solvent-based systems for recycling plastics, despite a need for efficient and selective methods amid the looming plastics and climate crises. Towards this need for better predictive tools, this work examines the use of classical and deep machine learning (ML) models for predicting categorical solubility in homopolymers and copolymers, with model architectures including random forest (RF), decision tree (DT), naive Bayes, AdaBoost, and graph neural networks (GNNs). We achieve high accuracy for both our homopolymer (82%, RF) and copolymer models (92%, RF) on unseen polymer–solvent systems in our 5-fold cross-validation studies. The relevance and applicability of our homopolymer models are then verified through in-house experiments examining the solubility of common commercial plastics, followed by an explainable AI (XAI) analysis using Shapley Additive Explanations (SHAP), which explores the relative contribution of each feature toward model predictions. We then apply our homopolymer solubility prediction model to remove unwanted or hazardous additives in polyethylene (PE) and polystyrene (PS) waste. This work demonstrates the validity/feasibility of using ML to predict homopolymer solubility, provides novel ML models for the prediction of copolymer solubility, and explains homopolymer model predictions before applying the explained model to a globally relevant waste challenge.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 424-437"},"PeriodicalIF":6.2000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00290c?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00290c","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Polymer solubility has applications in many important and diverse fields, including microprocessor fabrication, environmental conservation, paint formulation, and drug delivery, but it remains under-explored compared to its relative importance. This can be seen in the relative scarcity of solvent-based systems for recycling plastics, despite a need for efficient and selective methods amid the looming plastics and climate crises. Towards this need for better predictive tools, this work examines the use of classical and deep machine learning (ML) models for predicting categorical solubility in homopolymers and copolymers, with model architectures including random forest (RF), decision tree (DT), naive Bayes, AdaBoost, and graph neural networks (GNNs). We achieve high accuracy for both our homopolymer (82%, RF) and copolymer models (92%, RF) on unseen polymer–solvent systems in our 5-fold cross-validation studies. The relevance and applicability of our homopolymer models are then verified through in-house experiments examining the solubility of common commercial plastics, followed by an explainable AI (XAI) analysis using Shapley Additive Explanations (SHAP), which explores the relative contribution of each feature toward model predictions. We then apply our homopolymer solubility prediction model to remove unwanted or hazardous additives in polyethylene (PE) and polystyrene (PS) waste. This work demonstrates the validity/feasibility of using ML to predict homopolymer solubility, provides novel ML models for the prediction of copolymer solubility, and explains homopolymer model predictions before applying the explained model to a globally relevant waste challenge.