Ji-Feng Wang, Yu-Bo Sun, Qiu-Tong Chen, Fei-Fan Ji, Yuan-Yuan Song, Meng-Yuan Ruan, Ying Wang
{"title":"OpenPoly:一个支持基准测试和多属性预测的聚合物数据库","authors":"Ji-Feng Wang, Yu-Bo Sun, Qiu-Tong Chen, Fei-Fan Ji, Yuan-Yuan Song, Meng-Yuan Ruan, Ying Wang","doi":"10.1007/s10118-025-3402-y","DOIUrl":null,"url":null,"abstract":"<div><p>Advancing the integration of artificial intelligence and polymer science requires high-quality, open-source, and large-scale datasets. However, existing polymer databases often suffer from data sparsity, lack of polymer-property labels, and limited accessibility, hindering systematic modeling across property prediction tasks. Here, we present OpenPoly, a curated experimental polymer database derived from extensive literature mining and manual validation, comprising 3985 unique polymer-property data points spanning 26 key properties. We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models. Our results highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy. In data-scarce condition, XGBoost outperforms deep learning models on key properties such as dielectric constant, glass transition temperature, melting point, and mechanical strength, achieving R2 scores of 0.65—0.87. To further showcase the practical utility of the database, we propose potential polymers for two energy-relevant applications: high temperature polymer dielectrics and fuel cell membranes. By offering a consistent and accessible benchmark and database, OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.</p></div>","PeriodicalId":517,"journal":{"name":"Chinese Journal of Polymer Science","volume":"43 10","pages":"1749 - 1760"},"PeriodicalIF":4.0000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OpenPoly: A Polymer Database Empowering Benchmarking and Multi-property Predictions\",\"authors\":\"Ji-Feng Wang, Yu-Bo Sun, Qiu-Tong Chen, Fei-Fan Ji, Yuan-Yuan Song, Meng-Yuan Ruan, Ying Wang\",\"doi\":\"10.1007/s10118-025-3402-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Advancing the integration of artificial intelligence and polymer science requires high-quality, open-source, and large-scale datasets. However, existing polymer databases often suffer from data sparsity, lack of polymer-property labels, and limited accessibility, hindering systematic modeling across property prediction tasks. Here, we present OpenPoly, a curated experimental polymer database derived from extensive literature mining and manual validation, comprising 3985 unique polymer-property data points spanning 26 key properties. We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models. Our results highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy. In data-scarce condition, XGBoost outperforms deep learning models on key properties such as dielectric constant, glass transition temperature, melting point, and mechanical strength, achieving R2 scores of 0.65—0.87. To further showcase the practical utility of the database, we propose potential polymers for two energy-relevant applications: high temperature polymer dielectrics and fuel cell membranes. By offering a consistent and accessible benchmark and database, OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.</p></div>\",\"PeriodicalId\":517,\"journal\":{\"name\":\"Chinese Journal of Polymer Science\",\"volume\":\"43 10\",\"pages\":\"1749 - 1760\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Polymer Science\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10118-025-3402-y\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"POLYMER SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Polymer Science","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s10118-025-3402-y","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"POLYMER SCIENCE","Score":null,"Total":0}
OpenPoly: A Polymer Database Empowering Benchmarking and Multi-property Predictions
Advancing the integration of artificial intelligence and polymer science requires high-quality, open-source, and large-scale datasets. However, existing polymer databases often suffer from data sparsity, lack of polymer-property labels, and limited accessibility, hindering systematic modeling across property prediction tasks. Here, we present OpenPoly, a curated experimental polymer database derived from extensive literature mining and manual validation, comprising 3985 unique polymer-property data points spanning 26 key properties. We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models. Our results highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy. In data-scarce condition, XGBoost outperforms deep learning models on key properties such as dielectric constant, glass transition temperature, melting point, and mechanical strength, achieving R2 scores of 0.65—0.87. To further showcase the practical utility of the database, we propose potential polymers for two energy-relevant applications: high temperature polymer dielectrics and fuel cell membranes. By offering a consistent and accessible benchmark and database, OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.
期刊介绍:
Chinese Journal of Polymer Science (CJPS) is a monthly journal published in English and sponsored by the Chinese Chemical Society and the Institute of Chemistry, Chinese Academy of Sciences. CJPS is edited by a distinguished Editorial Board headed by Professor Qi-Feng Zhou and supported by an International Advisory Board in which many famous active polymer scientists all over the world are included. The journal was first published in 1983 under the title Polymer Communications and has the current name since 1985.
CJPS is a peer-reviewed journal dedicated to the timely publication of original research ideas and results in the field of polymer science. The issues may carry regular papers, rapid communications and notes as well as feature articles. As a leading polymer journal in China published in English, CJPS reflects the new achievements obtained in various laboratories of China, CJPS also includes papers submitted by scientists of different countries and regions outside of China, reflecting the international nature of the journal.