Polyuniverse: generation of a large-scale polymer library using rule-based polymerization reactions for polymer informatics†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Tianle Yue, Jianxin He and Ying Li
{"title":"Polyuniverse: generation of a large-scale polymer library using rule-based polymerization reactions for polymer informatics†","authors":"Tianle Yue, Jianxin He and Ying Li","doi":"10.1039/D4DD00196F","DOIUrl":null,"url":null,"abstract":"<p >Recent advancements in machine learning have revolutionized polymer research, leading to the swift integration of diverse computational techniques for <em>de novo</em> molecular design. A crucial aspect of these processes is to expand the number of candidate polymer structures, as the currently known real polymer structures are very limited. In contrast, small molecule databases are vast, offering extensive opportunities for the design of new molecules, such as drug discovery. In this study, we collected extensive small molecule compounds from GDB-17, GDB-13, and PubChem and selected polymerization reaction pathways for eight types of polymers, including polyimide, polyolefin, polyester, polyamide, polyurethane, epoxy, polybenzimidazole (PBI), and vitrimer. These small molecule datasets and polymerization reactions enabled us to generate hundreds of quadrillions of hypothetical polymer structures. For each of the eight polymers, along with one promising copolymer, poly(imide-imine), we randomly generated over one million hypothetical structures, except for PBI, for which we created 10 000 structures. Chemical space visualization using t-distributed stochastic neighbor embedding and synthetic accessibility scores were employed to assess the feasibility of synthesizing these new polymers. Customized feedforward neural network models predicted thermal, mechanical, and gas permeation properties for both real and hypothetical polymers. The results show that many hypothetical polymers, especially polyimides, exhibit significant potential, often surpassing real polymers in performance, particularly for high-temperature applications and gas separation. Our findings highlight the immense potential of large-scale hypothetical polymer libraries for materials discovery and design. These libraries not only aid in identifying promising polymer materials through high-throughput screening but also provide valuable datasets for training advanced machine learning models, such as large language models. This research also demonstrates the power of data-driven approaches in polymer science, paving the way for the development of next-generation polymeric materials with superior properties for diverse industrial applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2465-2478"},"PeriodicalIF":6.2000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00196f?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00196f","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advancements in machine learning have revolutionized polymer research, leading to the swift integration of diverse computational techniques for de novo molecular design. A crucial aspect of these processes is to expand the number of candidate polymer structures, as the currently known real polymer structures are very limited. In contrast, small molecule databases are vast, offering extensive opportunities for the design of new molecules, such as drug discovery. In this study, we collected extensive small molecule compounds from GDB-17, GDB-13, and PubChem and selected polymerization reaction pathways for eight types of polymers, including polyimide, polyolefin, polyester, polyamide, polyurethane, epoxy, polybenzimidazole (PBI), and vitrimer. These small molecule datasets and polymerization reactions enabled us to generate hundreds of quadrillions of hypothetical polymer structures. For each of the eight polymers, along with one promising copolymer, poly(imide-imine), we randomly generated over one million hypothetical structures, except for PBI, for which we created 10 000 structures. Chemical space visualization using t-distributed stochastic neighbor embedding and synthetic accessibility scores were employed to assess the feasibility of synthesizing these new polymers. Customized feedforward neural network models predicted thermal, mechanical, and gas permeation properties for both real and hypothetical polymers. The results show that many hypothetical polymers, especially polyimides, exhibit significant potential, often surpassing real polymers in performance, particularly for high-temperature applications and gas separation. Our findings highlight the immense potential of large-scale hypothetical polymer libraries for materials discovery and design. These libraries not only aid in identifying promising polymer materials through high-throughput screening but also provide valuable datasets for training advanced machine learning models, such as large language models. This research also demonstrates the power of data-driven approaches in polymer science, paving the way for the development of next-generation polymeric materials with superior properties for diverse industrial applications.

Abstract Image

Polyuniverse:使用基于规则的聚合反应用于聚合物信息学的大规模聚合物文库的生成
机器学习的最新进展彻底改变了聚合物研究,导致各种计算技术的快速整合,以进行从头开始的分子设计。这些过程的一个关键方面是扩大候选聚合物结构的数量,因为目前已知的真正的聚合物结构非常有限。相比之下,小分子数据库是巨大的,为新分子的设计提供了广泛的机会,比如药物发现。在这项研究中,我们从GDB-17、GDB-13和PubChem中收集了大量的小分子化合物,并选择了8种聚合物的聚合反应途径,包括聚酰亚胺、聚烯烃、聚酯、聚酰胺、聚氨酯、环氧树脂、聚苯并咪唑(PBI)和玻璃体。这些小分子数据集和聚合反应使我们能够产生数以千万亿计的假想聚合物结构。对于这八种聚合物中的每一种,以及一种有前途的共聚物聚(亚胺-亚胺),我们随机生成了超过一百万个假设结构,除了PBI,我们为其创建了10,000个结构。利用t分布随机邻居嵌入的化学空间可视化和合成可达性评分来评估合成这些新聚合物的可行性。定制的前馈神经网络模型预测了真实聚合物和假设聚合物的热、力学和气体渗透特性。结果表明,许多假设的聚合物,特别是聚酰亚胺,表现出巨大的潜力,在性能上往往超过真实的聚合物,特别是在高温应用和气体分离方面。我们的发现突出了大规模假设聚合物库在材料发现和设计方面的巨大潜力。这些库不仅有助于通过高通量筛选识别有前途的聚合物材料,还为训练高级机器学习模型(如大型语言模型)提供了有价值的数据集。这项研究还展示了数据驱动方法在聚合物科学中的力量,为开发具有优异性能的下一代聚合物材料铺平了道路,可用于各种工业应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信