基于大语言嵌入模型的多模态机器学习聚合物性能预测

IF 7 2区材料科学 Q2 CHEMISTRY, PHYSICAL

Chemistry of Materials Pub Date : 2025-08-31 DOI:10.1021/acs.chemmater.5c00940

Tianren Zhang*, and , Dai-Bei Yang,

{"title":"基于大语言嵌入模型的多模态机器学习聚合物性能预测","authors":"Tianren Zhang*,  and , Dai-Bei Yang, ","doi":"10.1021/acs.chemmater.5c00940","DOIUrl":null,"url":null,"abstract":"<p >Contemporary large language models (LLMs), such as GPT-4 and Llama, have harnessed extensive computational power and diverse text corpora to achieve remarkable proficiency in interpreting and generating domain-specific content, including materials science. To leverage the domain knowledge embedded within these models, we propose a simple yet effective multimodal architecture, PolyLLMem. By integrating text embeddings from Llama 3 with molecular structure embeddings from Uni-Mol, PolyLLMem enables the accurate prediction of polymer properties. Low-Rank Adaptation (LoRA) layers were integrated into our model during the property prediction stage to adapt the Llama 3 and Uni-Mol embeddings to our limited polymer data set, thereby enhancing their chemical relevance for polymer SMILES representation. Such a balanced fusion of fine-tuned textual and structural information enables PolyLLMem to robustly predict a variety of polymer properties despite the scarcity of training data. Its performance is comparable to, and in some cases exceeds, that of graph-based or transformer-based models that typically require pretraining on millions of polymer samples. These findings demonstrate that LLM, such as Llama, can effectively capture chemical information encoded in polymer SMILES (PSMILES), and underscore the efficacy of multimodal fusion of LLM embeddings and molecular structure embeddings in overcoming data scarcity and accelerating the discovery of advanced polymeric materials.</p>","PeriodicalId":33,"journal":{"name":"Chemistry of Materials","volume":"37 18","pages":"7002–7013"},"PeriodicalIF":7.0000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Machine Learning with Large Language Embedding Model for Polymer Property Prediction\",\"authors\":\"Tianren Zhang*,  and , Dai-Bei Yang, \",\"doi\":\"10.1021/acs.chemmater.5c00940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Contemporary large language models (LLMs), such as GPT-4 and Llama, have harnessed extensive computational power and diverse text corpora to achieve remarkable proficiency in interpreting and generating domain-specific content, including materials science. To leverage the domain knowledge embedded within these models, we propose a simple yet effective multimodal architecture, PolyLLMem. By integrating text embeddings from Llama 3 with molecular structure embeddings from Uni-Mol, PolyLLMem enables the accurate prediction of polymer properties. Low-Rank Adaptation (LoRA) layers were integrated into our model during the property prediction stage to adapt the Llama 3 and Uni-Mol embeddings to our limited polymer data set, thereby enhancing their chemical relevance for polymer SMILES representation. Such a balanced fusion of fine-tuned textual and structural information enables PolyLLMem to robustly predict a variety of polymer properties despite the scarcity of training data. Its performance is comparable to, and in some cases exceeds, that of graph-based or transformer-based models that typically require pretraining on millions of polymer samples. These findings demonstrate that LLM, such as Llama, can effectively capture chemical information encoded in polymer SMILES (PSMILES), and underscore the efficacy of multimodal fusion of LLM embeddings and molecular structure embeddings in overcoming data scarcity and accelerating the discovery of advanced polymeric materials.</p>\",\"PeriodicalId\":33,\"journal\":{\"name\":\"Chemistry of Materials\",\"volume\":\"37 18\",\"pages\":\"7002–7013\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemistry of Materials\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.chemmater.5c00940\",\"RegionNum\":2,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry of Materials","FirstCategoryId":"88","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.chemmater.5c00940","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

摘要

当代大型语言模型（llm），如GPT-4和Llama，利用广泛的计算能力和多样化的文本语料库，在解释和生成特定领域内容（包括材料科学）方面取得了非凡的熟练程度。为了利用嵌入在这些模型中的领域知识，我们提出了一个简单而有效的多模式体系结构PolyLLMem。通过整合来自Llama 3的文本嵌入和Uni-Mol的分子结构嵌入，PolyLLMem能够准确预测聚合物的性质。在性质预测阶段，我们将低秩自适应（LoRA）层集成到我们的模型中，以使Llama 3和Uni-Mol嵌入适应我们有限的聚合物数据集，从而增强它们与聚合物SMILES表示的化学相关性。这种微调文本和结构信息的平衡融合使PolyLLMem能够在训练数据稀缺的情况下稳健地预测各种聚合物的性能。它的性能与基于图形或变压器的模型相当，在某些情况下甚至超过了基于图形或变压器的模型，这些模型通常需要对数百万个聚合物样本进行预训练。这些发现表明Llama等LLM可以有效捕获聚合物SMILES （PSMILES）中编码的化学信息，并强调了LLM嵌入和分子结构嵌入的多模态融合在克服数据稀缺和加速先进聚合物材料发现方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multimodal Machine Learning with Large Language Embedding Model for Polymer Property Prediction

查看原文本刊更多论文

Multimodal Machine Learning with Large Language Embedding Model for Polymer Property Prediction

Contemporary large language models (LLMs), such as GPT-4 and Llama, have harnessed extensive computational power and diverse text corpora to achieve remarkable proficiency in interpreting and generating domain-specific content, including materials science. To leverage the domain knowledge embedded within these models, we propose a simple yet effective multimodal architecture, PolyLLMem. By integrating text embeddings from Llama 3 with molecular structure embeddings from Uni-Mol, PolyLLMem enables the accurate prediction of polymer properties. Low-Rank Adaptation (LoRA) layers were integrated into our model during the property prediction stage to adapt the Llama 3 and Uni-Mol embeddings to our limited polymer data set, thereby enhancing their chemical relevance for polymer SMILES representation. Such a balanced fusion of fine-tuned textual and structural information enables PolyLLMem to robustly predict a variety of polymer properties despite the scarcity of training data. Its performance is comparable to, and in some cases exceeds, that of graph-based or transformer-based models that typically require pretraining on millions of polymer samples. These findings demonstrate that LLM, such as Llama, can effectively capture chemical information encoded in polymer SMILES (PSMILES), and underscore the efficacy of multimodal fusion of LLM embeddings and molecular structure embeddings in overcoming data scarcity and accelerating the discovery of advanced polymeric materials.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chemistry of Materials 工程技术-材料科学：综合

CiteScore

14.10

自引率

5.80%

发文量

929

审稿时长

1.5 months

期刊介绍： The journal Chemistry of Materials focuses on publishing original research at the intersection of materials science and chemistry. The studies published in the journal involve chemistry as a prominent component and explore topics such as the design, synthesis, characterization, processing, understanding, and application of functional or potentially functional materials. The journal covers various areas of interest, including inorganic and organic solid-state chemistry, nanomaterials, biomaterials, thin films and polymers, and composite/hybrid materials. The journal particularly seeks papers that highlight the creation or development of innovative materials with novel optical, electrical, magnetic, catalytic, or mechanical properties. It is essential that manuscripts on these topics have a primary focus on the chemistry of materials and represent a significant advancement compared to prior research. Before external reviews are sought, submitted manuscripts undergo a review process by a minimum of two editors to ensure their appropriateness for the journal and the presence of sufficient evidence of a significant advance that will be of broad interest to the materials chemistry community.