{"title":"Machine learning prediction of heat capacity of polymers as a function of temperature","authors":"Kazuhiko Ishikiriyama","doi":"10.1016/j.polymer.2025.129171","DOIUrl":null,"url":null,"abstract":"Machine learning models were developed using the high-quality ATHAS (Advanced Thermal Analysis System) data bank to predict the constant-pressure heat capacity (<em>C</em><sub><em>P</em></sub>) of polymers at 10 K intervals from 10 to 500 K. Molecular fingerprints (FPs) were used as features; specifically, circular Morgan fingerprints with a bond diameter of 4 derived from the repeating structural units of polymers. For polymers contained in the ATHAS data bank (e.g., polypropylene and polyamide 6), the predicted <em>C</em><sub><em>P</em></sub> values showed mean relative errors (MREs) within ±3%. In contrast, for polymers absent from the data bank—including poly(<em>p</em>-dioxanone), poly(<em>N</em>-vinylpyrrolidone), and starch—a positive correlation was observed between MRE and the number of missing substructures (<em>N</em><sub>ms</sub>), defined as hashed identifiers present in the target polymer but absent from the ATHAS-derived feature space. Using this correlation, <em>C</em><sub><em>P</em></sub> predictions for polymers with <em>N</em><sub>ms</sub> > 0 were adjusted, reducing the MREs to within ±3%. To improve accuracy, additional models employing alternative FPs were built: polyBERT FP, generated from a pre-trained BERT-based chemical language model, and OMG FP and SMiPoly FP, derived from the virtual polymer libraries OMG and SMiPoly. For polymers with <em>N</em><sub>ms</sub> > 0, all alternative FPs yielded lower MREs than uncorrected Morgan fingerprints. The lowest MREs were achieved using a hybrid FP constructed from OMG and 10% of the SMiPoly dataset, demonstrating enhanced extrapolative performance. Due to computational limits, molecular dynamics struggles to capture this temperature dependence, whereas trained machine learning models may rapidly predict it for many polymers, suggesting their potential as a practical alternative.","PeriodicalId":405,"journal":{"name":"Polymer","volume":"214 1","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Polymer","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1016/j.polymer.2025.129171","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"POLYMER SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning models were developed using the high-quality ATHAS (Advanced Thermal Analysis System) data bank to predict the constant-pressure heat capacity (CP) of polymers at 10 K intervals from 10 to 500 K. Molecular fingerprints (FPs) were used as features; specifically, circular Morgan fingerprints with a bond diameter of 4 derived from the repeating structural units of polymers. For polymers contained in the ATHAS data bank (e.g., polypropylene and polyamide 6), the predicted CP values showed mean relative errors (MREs) within ±3%. In contrast, for polymers absent from the data bank—including poly(p-dioxanone), poly(N-vinylpyrrolidone), and starch—a positive correlation was observed between MRE and the number of missing substructures (Nms), defined as hashed identifiers present in the target polymer but absent from the ATHAS-derived feature space. Using this correlation, CP predictions for polymers with Nms > 0 were adjusted, reducing the MREs to within ±3%. To improve accuracy, additional models employing alternative FPs were built: polyBERT FP, generated from a pre-trained BERT-based chemical language model, and OMG FP and SMiPoly FP, derived from the virtual polymer libraries OMG and SMiPoly. For polymers with Nms > 0, all alternative FPs yielded lower MREs than uncorrected Morgan fingerprints. The lowest MREs were achieved using a hybrid FP constructed from OMG and 10% of the SMiPoly dataset, demonstrating enhanced extrapolative performance. Due to computational limits, molecular dynamics struggles to capture this temperature dependence, whereas trained machine learning models may rapidly predict it for many polymers, suggesting their potential as a practical alternative.
期刊介绍:
Polymer is an interdisciplinary journal dedicated to publishing innovative and significant advances in Polymer Physics, Chemistry and Technology. We welcome submissions on polymer hybrids, nanocomposites, characterisation and self-assembly. Polymer also publishes work on the technological application of polymers in energy and optoelectronics.
The main scope is covered but not limited to the following core areas:
Polymer Materials
Nanocomposites and hybrid nanomaterials
Polymer blends, films, fibres, networks and porous materials
Physical Characterization
Characterisation, modelling and simulation* of molecular and materials properties in bulk, solution, and thin films
Polymer Engineering
Advanced multiscale processing methods
Polymer Synthesis, Modification and Self-assembly
Including designer polymer architectures, mechanisms and kinetics, and supramolecular polymerization
Technological Applications
Polymers for energy generation and storage
Polymer membranes for separation technology
Polymers for opto- and microelectronics.